Class | Description | |
---|---|---|
TextTokenizer | The TextTokenizer class locates the boundaries of words in a block of text. Word boundary locations are found according to these general principles: Be able to tokenize a block of text specified by start and end positions Default separator is Unicode white space character. | |
Token | A Token is an occurrence of a word in a block of text. |