Expand description

Data structures and Iterators for convenient high-level syntax


An iterable Corpus of HTML5 documents
One of our math documents.
File-system iterator yielding individual documents
A paragraph of a document with a DNM
An iterator over paragraphs of a Document. Ignores paragraphs containing ltx_ERROR markup
An iterator over the words of a sentence, where the words (and potentially additional information) are obtained using senna
A sentence in a document
An iterator over the sentences of a document/paragraph
An iterator over the words of a sentence, where the words are only defined by their ranges
A word with a POS tag