[][src]Crate llamapun

The LLaMaPUn library in Rust

Language and Mathematics Processing and Understanding Common data structures and algorithms for semi-structured NLP on math-rich documents.

Modules

ams

Representation, normalization and utilities for working with AMS markup in LaTeX-derived scientific documents

data

Data structures and Iterators for convenient high-level syntax

dnm

The dnm can be used for easier switching between the DOM (Document Object Model) representation and the plain text representation, which is needed for most NLP tools.

extern_use

Expose convenience calls to be used from non-Rust applications

ngrams

A small ngram library ngrams are sequences of n consecutive words

parallel_data

Data structures and Iterators for rayon-enabled parallel processing including parallel I/O in walking a corpus as well as DOM primitives that allow parallel iterators on XPath results, etc

patterns

A module for pattern matching in mathematical documents

stopwords

A tiny stopwords library Stopwords are words frequent words like "the", "it", "then", which would add too much noise to certain statistical methods

tokenizer

Provides functionality for tokenizing sentences and words

util

Various useful code snippets

Macros

record_node_map

A handy macro for idiomatic recording in the node_map