INDENT_TAB
INDENT_TAB
Tab Indentation (\t)
Performs lexical analysis and provides a token generator.
Tokens are defined as single units of code (e.g. tag, class, id, attributeStart, attribute, attributeEnd)
These will run through the parser and be converted to an AST
The lexer works sequentially, ->lex will return a generator and you can read that generator in any manner you like. The generator will produce valid tokens until the end of the passed input.
Usage example:
use Tale\Jade\Lexer;
$lexer = new Lexer();
foreach ($lexer->lex($jadeInput) as $token)
echo $token;
//Prints a human-readable dump of the generated tokens
__construct(array|null $options = null)
Creates a new lexer instance.
The options should be an associative array
Valid options are:
indentStyle: The indentation character (auto-detected) indentWidth: How often to repeat indentStyle (auto-detected) encoding: The encoding when working with mb_*-functions (Default: UTF-8) scans: An array of scans that will be performed
Passing an indentation-style forces you to stick to that style. If not, the lexer will assume the first indentation type it finds as the indentation. Mixed indentation is not possible, since it would be a bitch to calculate without taking away configuration freedom
Add a new scan to 'scans' to extend the lexer. Notice that you need the fitting 'handle*'-method in the parser or you will get unhandled-token-exceptions.
array|null | $options | the options passed to the lexer instance |
lex(string $input) : \Generator
Returns a generator that will lex the passed $input sequentially.
If you don't move the generator, the lexer does nothing. Only as soon as you iterate the generator or call next()/current() on it the lexer will start it's work and spit out tokens sequentially. This approach takes less memory during the lexing process.
Tokens are always an array and always provide the following keys:
[ 'type' => The token type, 'line' => The line this token is on, 'offset' => The offset this token is at ]string | $input | the Jade-string to lex into tokens |
a generator that can be iterated sequentially
peek(integer $length = 1) : string
Shows the next characters in our input.
Pass a $length to get more than one character. The character's won't be consumed here, they are just shown. The position pointer won't be moved forward
The result gets saved in $_lastPeekResult
integer | $length | the length of the string we want to peek on |
the peeked string
consume(integer|null $length = null) : $this
Consumes a length or the length of the last peeked string.
Internally $input = substr($input, $length) is done, so everything before the consumed length will be cut off and removed from the RAM (since we probably tokenized it already, remember? sequential shit etc.?)
integer|null | $length | the length to consume or null, to use the length of the last peeked string |
read(callable $callback, integer $length = 1) : string
Peeks and consumes chars until the passed callback returns false.
The callback takes the current character as the first argument.
This works great with ctype_*-functions
If the last character doesn't match, it also won't be consumed You can always go on reading right after a call to ->read()
e.g. $alNumString = $this->read('ctype_alnum') $spaces = $this->read('ctype_space')
callable | $callback | the callback to check the current character against |
integer | $length | the length to peek. This will also increase the length of the characters passed to the callback |
the read string
readBracketContents(array|null $breakChars = null) : string
Reads a "value", 'value' or value style string really gracefully.
It will stop on all chars passed to $breakChars as well as a closing ')' when not inside an expression initiated with either ", ', (, [ or {.
$breakChars might be [','] as an example to read sequential arguments into an array. Scan for ',', skip spaces, repeat readBracketContents
Brackets are counted, strings are respected.
Inside a " string, \" escaping is possible, inside a ' string, \' escaping is possible
As soon as a ) is found and we're outside a string and outside any kind of bracket, the reading will stop and the value, including any quotes, will be returned
Examples: ('`' marks the parts that are read, understood and returned by this function)
(arg1=`abc`, arg2=`"some expression"`, `'some string expression'`) some-mixin(`'some arg'`, `[1, 2, 3, 4]`, `(isset($complex) ? $complex : 'complex')`) and even some-mixin(callback=`function($input) { return trim($input, '\'"'); }`)array|null | $breakChars | the chars to break on. |
the (possibly quote-enclosed) result string
match(string $pattern, string $modifiers = '') : boolean
Matches a pattern against the start of the current $input.
Notice that this always takes the start of the current pointer
position as a reference, since consume
means cutting of the front
of the input string
After a match was successful, you can retrieve the matches with ->getMatch() and consume the whole match with ->consumeMatch()
^ gets automatically prepended to the pattern (since it makes no sense for a sequential lexer to search inside the input)
string | $pattern | the regular expression without delimeters and a ^-prefix |
string | $modifiers | the usual PREG RegEx-modifiers |
scanFor(array $scans, boolean|false $throwException = false) : \Generator
Keeps scanning for all types of tokens passed as the first argument.
If one token is encountered that's not in $scans, the function breaks or throws an exception, if the second argument is true
The passed scans get converted to methods e.g. newLine => scanNewLine, blockExpansion => scanBlockExpansion etc.
array | $scans | the scans to perform |
boolean|false | $throwException | throw an exception if no tokens in $scans found anymore |
the generator yielding all tokens found
createToken(string $type) : array
Creates a new token.
A token is an associative array. The following keys always exist:
type: The type of the node (e.g. newLine, tag, class, id) line: The line we encountered this token on offset: The offset on a line we encountered it on
Before adding a new token-type, make sure that the Parser knows how to handle it and the Compiler knows how to compile it.
string | $type | the type to give that token |
the token array
scanToken(string $type, string $pattern, string $modifiers = '') : \Generator
Scans for a specific token-type based on a pattern and converts it to a valid token automatically.
All matches that have a name (RegEx (?
For matching, ->match() is used internally
string | $type | the token type to create, if matched |
string | $pattern | the pattern to match |
string | $modifiers | the regex-modifiers for the pattern |
scanIndent() : \Generator|void
Scans for indentation and automatically keeps the $_level updated through all tokens.
Upon reaching a higher level, an
If you outdented 3 levels, 3
The first indentation this function encounters will be used as the indentation style for this document.
You can indent with everything between 1 space and a few million tabs other than most Jade implementations
scanImport() : \Generator
Scans for imports and yields an <import>-token if found.
Import-tokens always have: importType, which is either "extends" or "include path, the (relative) path to which the import points
Import-tokens may have: filter, which is an optional filter that should be only usable on "include"
scanBlock() : \Generator
Scans for <block>-tokens.
Blocks can have three styles: block append|prepend|replace name append|prepend|replace name or simply block (for mixin blocks)
Block-tokens may have: mode, which is either "append", "prepend" or "replace" name, which is the name of the block
scanControlStatement(string $type, array $names, string|null $nameAttribute = null) : \Generator
Scans for a control-statement-kind of token.
e.g. control-statement-name ($expression)
Since the
If the condition can have a subject, the subject will be set as the "subject"-value of the token
string | $type | The token type that should be created if scan is successful |
array | $names | The names the statement can have (e.g. do, while, if, else etc.) |
string|null | $nameAttribute | The attribute the name gets saved into, if wanted |
scanExpression() : \Generator
Scans for a - or !?=-style expression.
e.g. != expr = expr
Expression-tokens always have: escaped, which indicates that the expression result should be escaped return, which indicates if the expression should return or just evaluate the result
scanExpansion() : \Generator
Scans for a <expansion>-token.
(a: b-style expansion or a:b-style tags)
Expansion-tokens always have: withSpace, which indicates wether there's a space after the double-colon
Usually, if there's no space, it should be handled as part of a tag-name
scanAttributes() : \Generator
Scans for an attribute-block.
Attribute blocks always consist of the following tokens:
strpos(string $haystack, string $needle, integer|null $offset = null) : integer|false
mb_* compatible version of PHP's strpos.
(so we don't require mb.func_overload)
string | $haystack | the string to search in |
string | $needle | the string we search for |
integer|null | $offset | the offset at which we might expect it |
the offset of the string or false, if not found
substr(string $string, integer $start, integer|null $range = null) : string
mb_* compatible version of PHP's substr.
(so we don't require mb.func_overload)
string | $string | the string to get a sub-string of |
integer | $start | the start-index |
integer|null | $range | the amount of characters we want to get |
the sub-string
substr_count(string $haystack, string $needle) : integer
mb_* compatible version of PHP's substr_count.
(so we don't require mb.func_overload)
string | $haystack | the string we want to count sub-strings in |
string | $needle | the sub-string we want to count inside $haystack |
the amount of occurences of $needle in $haystack