Tokens¶
A Token is a chain of characters forming a coherent text unit in a document.
SuperToken¶
-
class
pyTokenizer.SuperToken(startToken, endToken=None)[source]¶ Bases:
pyTokenizer.Token
ValuedToken¶
-
class
pyTokenizer.ValuedToken(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.Token
StartOfDocumentToken¶
A topken stream starts with a StartOfDocumentToken.
-
class
pyTokenizer.StartOfDocumentToken[source]¶ Bases:
pyTokenizer.ValuedToken
CharacterToken¶
-
class
pyTokenizer.CharacterToken(previousToken, value, start)[source]¶ Bases:
pyTokenizer.ValuedToken
SpaceToken¶
-
class
pyTokenizer.SpaceToken(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
DelimiterToken¶
-
class
pyTokenizer.DelimiterToken(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
NumberToken¶
A NumberToken represents a number (RegExp: [0-9]+).
-
class
pyTokenizer.NumberToken(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
StringToken¶
A StringToken represents a word (RegExp: [a-zA-Z]+).
-
class
pyTokenizer.StringToken(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken