Packagecom.adobe.linguistics.utils
Classpublic class TextTokenizer
InheritanceTextTokenizer Inheritance Object
Implements com.adobe.linguistics.utils.ITokenizer

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 9.x

The TextTokenizer class locates the boundaries of words in a block of text.

Word boundary locations are found according to these general principles:

In the future versions, this class would also provide a way for the developers to customize the separators used by the tokenizer.



Public Properties
 PropertyDefined By
  ignoredSeparators : Vector.<int>
Get all of ignored separators used by this tokenizer class.
TextTokenizer
Public Methods
 MethodDefined By
  
TextTokenizer(textHolder:String, startIndex:int = 0, endIndex:int)
The tokenizer for a String object.
TextTokenizer
  
Return the first word in the text being scanned.
TextTokenizer
  
Determine the next word following the current token.
TextTokenizer
  
Determine the word preceding the current token.
TextTokenizer
Property Detail
ignoredSeparatorsproperty
ignoredSeparators:Vector.<int>

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 10

Get all of ignored separators used by this tokenizer class. A vector of int containing all of ignored separators code point which are used by this class.


Implementation
    public function get ignoredSeparators():Vector.<int>
    public function set ignoredSeparators(value:Vector.<int>):void
Constructor Detail
TextTokenizer()Constructor
public function TextTokenizer(textHolder:String, startIndex:int = 0, endIndex:int)

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 10

The tokenizer for a String object. This class implements the ITokenizer interface. Constructs a new TextTokenizer object to break String to words by creating with a new piece of text.

Parameters
textHolder:String — A String object to hold the text which will be processed by this tokenizer.
 
startIndex:int (default = 0) — A int type input to hold the starting index of input text should be scanned.
 
endIndex:int (default = NaN) — A int type input to hold the ending index of input text should be scanned.
Method Detail
getFirstToken()method
public function getFirstToken():Token

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 10

Return the first word in the text being scanned.

NOTE: In a special case when there are no valid tokens in text, it returns a pseudo token having first and last index set to int.MAX_VALUE. As a result firstToken().first equals int.MAX_VALUE and firstToken().last equals int.MAX_VALUE.

Returns
Token
getNextToken()method 
public function getNextToken(token:Token):Token

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 10

Determine the next word following the current token.

Returns the token of the next word.

NOTE: When there are no more valid tokens, it returns a pseudo token having first and last index set to int.MAX_VALUE. As a result getNextToken().first equals int.MAX_VALUE and getNextToken().last equals int.MAX_VALUE.

Parameters

token:Token — A Token object to be used for determining next word.

Returns
Token
getPreviousToken()method 
public function getPreviousToken(token:Token):Token

Language Version : ActionScript 3.0
Runtime Versions : AIR 1.0, Flash Player 10

Determine the word preceding the current token.

Returns the token of the previous word or getFirstToken object if there is no preceding word.

Parameters

token:Token — A Token object to be used for determining previous word.

Returns
Token