Struct llamapun::ngrams::Ngrams

source · [−]

pub struct Ngrams {
    pub anchor: Option<String>,
    pub window_size: usize,
    pub n: usize,
    pub counts: HashMap<String, usize>,
}

Expand description

Ngrams are dictionaries with

Fields

anchor: Option<String>

anchor word that must be present in all ngram contexts (in their window)

window_size: usize

if an anchor word is given, word window size, applied to the left and to the right of the anchor word

n: usize

n-grams for a sequence of n words

counts: HashMap<String, usize>

statistics hashmap for the occurence counts

Implementations

impl Ngrams

pub fn get(&self, word: &str) -> usize

Get the word count

pub fn insert(&mut self, phrase: String)

count a newly seen ngram phrase

pub fn sorted(&self) -> Vec<(&String, usize)>ⓘNotable traits for Vec<u8, A>`impl<A> Write for Vec<u8, A>where A: Allocator,`

obtain the ngram report, sorted by descending frequency

pub fn distinct_count(&self) -> usize

get the number of distinct ngrams recorded

pub fn add_content(&mut self, content: &str)

add content for ngram analysis, typically a paragraph or a line of text

pub fn add_anchored_content(&mut self, content: &str)

In essence, for a given window size W, a word at index i is justified to participate in the ngrams if there is an instance of an anchor word in the range of words [i-W, i+W]. this can be highly irregular e.g. “word word anchor word anchor word word”, so we record flexibly looking for no-justification cutoffs, where a continuous word sequence is recorded for ngram counts

pub fn record_words(&mut self, words: Vec<&str>)

Take an arbitrarily long vector of words, and record all (overlapping) ngrams obtainable from it

Trait Implementations

impl Default for Ngrams

fn default() -> Ngrams

Returns the “default value” for a type. Read more

Auto Trait Implementations

impl RefUnwindSafe for Ngrams

impl Send for Ngrams

impl Sync for Ngrams

impl Unpin for Ngrams

impl UnwindSafe for Ngrams

Blanket Implementations

impl<T> Any for Twhere
T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for Twhere
T: ?Sized,

const: unstable · source

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for Twhere
T: ?Sized,

const: unstable · source

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

const: unstable · source

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for Twhere
U: From<T>,

const: unstable · source

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> Pointable for T

const ALIGN: usize = mem::align_of::<T>()

The alignment of pointer.

type Init = T

The type for initializers.

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

impl<T, U> TryFrom<U> for Twhere
U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

const: unstable · source

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for Twhere
U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

const: unstable · source

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.