pub struct Corpus {
pub path: String,
pub xml_parser: Parser,
pub html_parser: Parser,
pub tokenizer: Tokenizer,
pub senna: RefCell<Senna>,
pub senna_options: Cell<SennaParseOptions>,
pub dnm_parameters: DNMParameters,
pub extension: Option<String>,
}
Expand description
An iterable Corpus of HTML5 documents
Fields
path: String
root directory
xml_parser: Parser
document XHTML5 parser
html_parser: Parser
document HTML5 parser
tokenizer: Tokenizer
DNM
-aware sentence and word tokenizer
senna: RefCell<Senna>
Senna
object for shallow language analysis
senna_options: Cell<SennaParseOptions>
Senna
parsing options
dnm_parameters: DNMParameters
Default setting for DNM
generation
extension: Option<String>
Extension of corpus files (for specially tailored resources such as DLMF’s .html5) defaults to selecting .html AND .xhtml files
Implementations
sourceimpl Corpus
impl Corpus
sourcepub fn iter(&mut self) -> DocumentIterator<'_>ⓘNotable traits for DocumentIterator<'iter>impl<'iter> Iterator for DocumentIterator<'iter> type Item = Document<'iter>;
pub fn iter(&mut self) -> DocumentIterator<'_>ⓘNotable traits for DocumentIterator<'iter>impl<'iter> Iterator for DocumentIterator<'iter> type Item = Document<'iter>;
Get an iterator over the documents
Trait Implementations
Auto Trait Implementations
impl !RefUnwindSafe for Corpus
impl !Send for Corpus
impl !Sync for Corpus
impl Unpin for Corpus
impl UnwindSafe for Corpus
Blanket Implementations
sourceimpl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more