Crate tabula[−][src]
Expand description
Rust bindings for tabulapdf/tabula-java
Prerequisites
In order to use tabula-rs, you will need a tabula-java bytecode archive (jar). You can build it yourself by cloning ssh://git@github.com/tabulapdf/tabula-java.git and then running invoking maven to build it.
git clone git@github.com:tabulapdf/tabula-java.git && cd tabula-java mvn compile assembly:single
the built archive should then be target/tabula-$TABULA_VER-jar-with-dependencies.jar.
Additionally, make sure $JAVA_HOME/lib/server/libjvm.so
is reachable through LD_LIBRARY_PATH
or explicitly set it as LD_PRELOAD
.
Using tabula-rs
Initalizing JVM & accessing JNI
in order to make use of tabula-java, you’ll need to start jni::JavaVM with the built archive added to its classpath. You could either do this manually, or call TabulaVM::new()` with the (space escaped) path to the archive as parameter.
Using TabulaVM you can now access the Java native interface by calling TabulaVM::attach().
let vm = TabulaVM::new("../tabula-java/target/tabula-1.0.6-SNAPSHOT-jar-with-dependencies.jar", false).unwrap(); let env = vm.attach().unwrap();
Instantiating Tabula class
with access to the JNI you can instantia the Tabula class by calling TabulaEnv::configure_tabula().
let tabula = env.configure_tabula(None, None, OutputFormat::Csv, true, ExtractionMethod::Basic, false, None).unwrap();
Parsing the document
Tabula provides Tabula::parse_document() that then parses a document located a its given path and returns a std::fs::File located in memory.
let file = tabula.parse_document(&std::path::Path::new("./test_data/spanning_cells.pdf"), "test_spanning_cells").unwrap();
Relavant links
- tabula-rs forge: https://github.com/sp1ritCS/tabula-rs
- tabula-java project: https://github.com/tabulapdf/tabula-java/
Re-exports
pub use jni;
Structs
Oxidized technology.tabula.Rectangle
Tabula class
Java native interface capable of instantiating Tabula class
Java VM capable of using Tabula
Enums
Oxidized technology.tabula.CommandLineApp$ExtractionMethod
Oxidized technology.tabula.CommandLineApp$OutputFormat
Constants
Type Definitions
Result returned from JNI