Tokens¶
A Token
is a chain of characters forming a coherent text unit in a document.
SuperToken¶
-
class
pyTokenizer.
SuperToken
(startToken, endToken=None)[source]¶ Bases:
pyTokenizer.Token
ValuedToken¶
-
class
pyTokenizer.
ValuedToken
(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.Token
StartOfDocumentToken¶
A topken stream starts with a StartOfDocumentToken
.
-
class
pyTokenizer.
StartOfDocumentToken
[source]¶ Bases:
pyTokenizer.ValuedToken
CharacterToken¶
-
class
pyTokenizer.
CharacterToken
(previousToken, value, start)[source]¶ Bases:
pyTokenizer.ValuedToken
SpaceToken¶
-
class
pyTokenizer.
SpaceToken
(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
DelimiterToken¶
-
class
pyTokenizer.
DelimiterToken
(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
NumberToken¶
A NumberToken
represents a number (RegExp: [0-9]+
).
-
class
pyTokenizer.
NumberToken
(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken
StringToken¶
A StringToken
represents a word (RegExp: [a-zA-Z]+
).
-
class
pyTokenizer.
StringToken
(previousToken, value, start, end=None)[source]¶ Bases:
pyTokenizer.ValuedToken