\Tale\JadeLexer

Performs lexical analysis and provides a token generator.

Tokens are defined as single units of code (e.g. tag, class, id, attributeStart, attribute, attributeEnd)

These will run through the parser and be converted to an AST

The lexer works sequentially, ->lex will return a generator and you can read that generator in any manner you like. The generator will produce valid tokens until the end of the passed input.

Usage example:

use Tale\Jade\Lexer;

$lexer = new Lexer();

foreach ($lexer->lex($jadeInput) as $token)
     echo $token;

//Prints a human-readable dump of the generated tokens

Summary

Methods
Properties
Constants
__construct()
getInput()
getLength()
getPosition()
getLine()
getOffset()
getLevel()
getIndentStyle()
getIndentWidth()
getLastPeekResult()
getLastMatches()
lex()
dump()
No public properties found
INDENT_TAB
INDENT_SPACE
isAtEnd()
peek()
consume()
read()
readSpaces()
readBracketContents()
match()
consumeMatch()
getMatch()
scanFor()
createToken()
scanToken()
scanIndent()
scanNewLine()
scanText()
scanTextBlock()
scanTextLine()
scanMarkup()
scanComment()
scanFilter()
scanImport()
scanBlock()
scanCase()
scanWhen()
scanConditional()
scanControlStatement()
scanEach()
scanWhile()
scanDo()
scanExpression()
scanExpansion()
scanSub()
scanDoctype()
scanTag()
scanClasses()
scanId()
scanMixin()
scanMixinCall()
scanAssignment()
scanAttributes()
throwException()
strlen()
strpos()
substr()
substr_count()
No protected properties found
N/A
No private methods found
$_input
$_length
$_position
$_line
$_offset
$_level
$_indentStyle
$_indentWidth
$_lastPeekResult
$_lastMatches
N/A

Constants

INDENT_TAB

INDENT_TAB

Tab Indentation (\t)

INDENT_SPACE

INDENT_SPACE

Space Indentation ( )

Properties

$_input

$_input : string

The current input string.

Type

string

$_length

$_length : integer

The total length of the current input.

Type

integer

$_position

$_position : integer

The current position inside the input string.

Type

integer

$_line

$_line : integer

The current line we are on.

Type

integer

$_offset

$_offset : integer

The current offset in a line we are on.

Resets on each new line and increases on each read character

Type

integer

$_level

$_level : integer

The current indentation level we are on.

Type

integer

$_indentStyle

$_indentStyle : string

The current indentation character.

Type

string

$_indentWidth

$_indentWidth : string

The width of the indentation.

Specifies how often $_indentStyle is repeated for each $_level

Type

string

$_lastPeekResult

$_lastPeekResult : string

The last result gotten via ->peek().

Type

string

$_lastMatches

$_lastMatches : array

The last matches gotten via ->match()

Type

array

Methods

__construct()

__construct(array|null  $options = null) 

Creates a new lexer instance.

The options should be an associative array

Valid options are:

indentStyle: The indentation character (auto-detected) indentWidth: How often to repeat indentStyle (auto-detected) encoding: The encoding when working with mb_*-functions (Default: UTF-8) scans: An array of scans that will be performed

Passing an indentation-style forces you to stick to that style. If not, the lexer will assume the first indentation type it finds as the indentation. Mixed indentation is not possible, since it would be a bitch to calculate without taking away configuration freedom

Add a new scan to 'scans' to extend the lexer. Notice that you need the fitting 'handle*'-method in the parser or you will get unhandled-token-exceptions.

Parameters

array|null $options

the options passed to the lexer instance

Throws

\Exception

getInput()

getInput() : string

Returns the current input-string worked on.

Returns

string

getLength()

getLength() : integer

Returns the total length of the current input-string.

Returns

integer

getPosition()

getPosition() : integer

Returns the total position in the current input-string.

Returns

integer

getLine()

getLine() : integer

Returns the line we are working on in the current input-string.

Returns

integer

getOffset()

getOffset() : integer

Gets the offset on a line (Line-start is 0) in the current input-string.

Returns

integer

getLevel()

getLevel() : integer

Returns the current indentation level we are on.

Returns

integer

getIndentStyle()

getIndentStyle() : string

Returns the detected or previously passed indentation style.

Returns

string

getIndentWidth()

getIndentWidth() : integer

Returns the detected or previously passed indentation width.

Returns

integer

getLastPeekResult()

getLastPeekResult() : string|null

Returns the last result of ->peek().

Returns

string|null

getLastMatches()

getLastMatches() : array|null

Returns the last array of matches through ->match.

Returns

array|null

lex()

lex(string  $input) : \Generator

Returns a generator that will lex the passed $input sequentially.

If you don't move the generator, the lexer does nothing. Only as soon as you iterate the generator or call next()/current() on it the lexer will start it's work and spit out tokens sequentially. This approach takes less memory during the lexing process.

Tokens are always an array and always provide the following keys:

[ 'type' => The token type, 'line' => The line this token is on, 'offset' => The offset this token is at ]

Parameters

string $input

the Jade-string to lex into tokens

Returns

\Generator —

a generator that can be iterated sequentially

dump()

dump(string  $input) 

Dumps jade-input into a set of string-represented tokens.

This makes debugging the lexer easier.

Parameters

string $input

the jade input to dump the tokens of

isAtEnd()

isAtEnd() : boolean

Checks if our read pointer is at the end of the code.

Returns

boolean

peek()

peek(integer  $length = 1) : string

Shows the next characters in our input.

Pass a $length to get more than one character. The character's won't be consumed here, they are just shown. The position pointer won't be moved forward

The result gets saved in $_lastPeekResult

Parameters

integer $length

the length of the string we want to peek on

Returns

string —

the peeked string

consume()

consume(integer|null  $length = null) : $this

Consumes a length or the length of the last peeked string.

Internally $input = substr($input, $length) is done, so everything before the consumed length will be cut off and removed from the RAM (since we probably tokenized it already, remember? sequential shit etc.?)

Parameters

integer|null $length

the length to consume or null, to use the length of the last peeked string

Throws

\Tale\Jade\Lexer\Exception

Returns

$this

read()

read(callable  $callback, integer  $length = 1) : string

Peeks and consumes chars until the passed callback returns false.

The callback takes the current character as the first argument.

This works great with ctype_*-functions

If the last character doesn't match, it also won't be consumed You can always go on reading right after a call to ->read()

e.g. $alNumString = $this->read('ctype_alnum') $spaces = $this->read('ctype_space')

Parameters

callable $callback

the callback to check the current character against

integer $length

the length to peek. This will also increase the length of the characters passed to the callback

Throws

\Exception

Returns

string —

the read string

readSpaces()

readSpaces() : string

Reads all TAB (\t) and SPACE ( ) chars until something else is found.

This is primarily used to parse the indentation at the begin of each line.

Throws

\Tale\Jade\Lexer\Exception

Returns

string —

the spaces that have been found

readBracketContents()

readBracketContents(array|null  $breakChars = null) : string

Reads a "value", 'value' or value style string really gracefully.

It will stop on all chars passed to $breakChars as well as a closing ')' when not inside an expression initiated with either ", ', (, [ or {.

$breakChars might be [','] as an example to read sequential arguments into an array. Scan for ',', skip spaces, repeat readBracketContents

Brackets are counted, strings are respected.

Inside a " string, \" escaping is possible, inside a ' string, \' escaping is possible

As soon as a ) is found and we're outside a string and outside any kind of bracket, the reading will stop and the value, including any quotes, will be returned

Examples: ('`' marks the parts that are read, understood and returned by this function)

(arg1=`abc`, arg2=`"some expression"`, `'some string expression'`) some-mixin(`'some arg'`, `[1, 2, 3, 4]`, `(isset($complex) ? $complex : 'complex')`) and even some-mixin(callback=`function($input) { return trim($input, '\'"'); }`)

Parameters

array|null $breakChars

the chars to break on.

Returns

string —

the (possibly quote-enclosed) result string

match()

match(string  $pattern, string  $modifiers = '') : boolean

Matches a pattern against the start of the current $input.

Notice that this always takes the start of the current pointer position as a reference, since consume means cutting of the front of the input string

After a match was successful, you can retrieve the matches with ->getMatch() and consume the whole match with ->consumeMatch()

^ gets automatically prepended to the pattern (since it makes no sense for a sequential lexer to search inside the input)

Parameters

string $pattern

the regular expression without delimeters and a ^-prefix

string $modifiers

the usual PREG RegEx-modifiers

Returns

boolean

consumeMatch()

consumeMatch() : $this

Consumes a match previously read and matched by ->match().

Returns

$this

getMatch()

getMatch(integer|string  $index) : mixed|null

Gets a match from the last ->match() call

Parameters

integer|string $index

the index of the usual PREG $matches argument

Returns

mixed|null —

the value of the match or null, if none found

scanFor()

scanFor(array  $scans, boolean|false  $throwException = false) : \Generator

Keeps scanning for all types of tokens passed as the first argument.

If one token is encountered that's not in $scans, the function breaks or throws an exception, if the second argument is true

The passed scans get converted to methods e.g. newLine => scanNewLine, blockExpansion => scanBlockExpansion etc.

Parameters

array $scans

the scans to perform

boolean|false $throwException

throw an exception if no tokens in $scans found anymore

Throws

\Tale\Jade\Lexer\Exception

Returns

\Generator —

the generator yielding all tokens found

createToken()

createToken(string  $type) : array

Creates a new token.

A token is an associative array. The following keys always exist:

type: The type of the node (e.g. newLine, tag, class, id) line: The line we encountered this token on offset: The offset on a line we encountered it on

Before adding a new token-type, make sure that the Parser knows how to handle it and the Compiler knows how to compile it.

Parameters

string $type

the type to give that token

Returns

array —

the token array

scanToken()

scanToken(string  $type, string  $pattern, string  $modifiers = '') : \Generator

Scans for a specific token-type based on a pattern and converts it to a valid token automatically.

All matches that have a name (RegEx (?...)-directive will directly get a key with that name and value on the token array

For matching, ->match() is used internally

Parameters

string $type

the token type to create, if matched

string $pattern

the pattern to match

string $modifiers

the regex-modifiers for the pattern

Returns

\Generator

scanIndent()

scanIndent() : \Generator|void

Scans for indentation and automatically keeps the $_level updated through all tokens.

Upon reaching a higher level, an -token is yielded, upon reaching a lower level, an -token is yielded

If you outdented 3 levels, 3 -tokens are yielded

The first indentation this function encounters will be used as the indentation style for this document.

You can indent with everything between 1 space and a few million tabs other than most Jade implementations

Throws

\Tale\Jade\Lexer\Exception

Returns

\Generator|void

scanNewLine()

scanNewLine() : \Generator

Scans for a new-line character and yields a <newLine>-token if found.

Returns

\Generator

scanText()

scanText() : \Generator

Scans for text until the end of the current line and yields a <text>-token if found.

Returns

\Generator

scanTextBlock()

scanTextBlock() : \Generator

Scans for text and keeps scanning text, if you indent once until it is outdented again (e.g. .-text-blocks, expressions, comments).

Yields anything between , , and tokens it encounters

Returns

\Generator

scanTextLine()

scanTextLine() : \Generator

Scans for a |-style text-line and yields it along with a text-block, if it has any.

Returns

\Generator

scanMarkup()

scanMarkup() : \Generator

Scans for HTML-markup based on a starting '<'.

The whole markup will be kept and yielded as a -token

Returns

\Generator

scanComment()

scanComment() : \Generator

Scans for //-? comments yielding a <comment> token if found as well as a stack of text-block tokens.

Returns

\Generator

scanFilter()

scanFilter() : \Generator

Scans for :<filterName>-style filters and yields a <filter> token if found.

Filter-tokens always have: name, which is the name of the filter

Returns

\Generator

scanImport()

scanImport() : \Generator

Scans for imports and yields an <import>-token if found.

Import-tokens always have: importType, which is either "extends" or "include path, the (relative) path to which the import points

Import-tokens may have: filter, which is an optional filter that should be only usable on "include"

Returns

\Generator

scanBlock()

scanBlock() : \Generator

Scans for <block>-tokens.

Blocks can have three styles: block append|prepend|replace name append|prepend|replace name or simply block (for mixin blocks)

Block-tokens may have: mode, which is either "append", "prepend" or "replace" name, which is the name of the block

Returns

\Generator

scanCase()

scanCase() : \Generator

Scans for a <case>-token.

Case-tokens always have: subject, which is the expression between the parenthesis

Returns

\Generator

scanWhen()

scanWhen() : \Generator

Scans for a <when>-token.

When-tokens always have: name, which is either "when" or "default" subject, which is the expression behind "when ..."

When-tokens may have: default, which indicates that this is the "default"-case

Returns

\Generator

scanConditional()

scanConditional() : \Generator

Scans for a <conditional>-token.

Conditional-tokens always have: conditionType, which is either "if", "unless", "elseif", "else if" or "else" subject, which is the expression the between the parenthesis

Returns

\Generator

scanControlStatement()

scanControlStatement(string  $type, array  $names, string|null  $nameAttribute = null) : \Generator

Scans for a control-statement-kind of token.

e.g. control-statement-name ($expression)

Since the -statement is a special little unicorn, it get's handled very specifically inside this function (But correctly!)

If the condition can have a subject, the subject will be set as the "subject"-value of the token

Parameters

string $type

The token type that should be created if scan is successful

array $names

The names the statement can have (e.g. do, while, if, else etc.)

string|null $nameAttribute

The attribute the name gets saved into, if wanted

Throws

\Tale\Jade\Lexer\Exception

Returns

\Generator

scanEach()

scanEach() : \Generator

Scans for an <each>-token.

Each-tokens always have: itemName, which is the name of the item for each iteration subject, which is the expression to iterate

Each-tokens may have: keyName, which is the name of the key for each iteration

Returns

\Generator

scanWhile()

scanWhile() : \Generator

Scans for a <while>-token.

While-tokens always have: subject, which is the expression between the parenthesis

Returns

\Generator

scanDo()

scanDo() : \Generator

Scans for a <do>-token.

Do-tokens are always stand-alone

Returns

\Generator

scanExpression()

scanExpression() : \Generator

Scans for a - or !?=-style expression.

e.g. != expr = expr

  • expr multiline expr

Expression-tokens always have: escaped, which indicates that the expression result should be escaped return, which indicates if the expression should return or just evaluate the result

Returns

\Generator

scanExpansion()

scanExpansion() : \Generator

Scans for a <expansion>-token.

(a: b-style expansion or a:b-style tags)

Expansion-tokens always have: withSpace, which indicates wether there's a space after the double-colon

Usually, if there's no space, it should be handled as part of a tag-name

Returns

\Generator

scanSub()

scanSub() : \Generator

Scans sub-expressions of elements, e.g. a text-block initiated with a dot (.) or a block expansion.

Yields whatever scanTextBlock() and scanExpansion() yield

Returns

\Generator

scanDoctype()

scanDoctype() : \Generator

Scans for a <doctype>-token.

Doctype-tokens always have: name, which is the passed name of the doctype or a custom-doctype, if the named doctype isn't provided

Returns

\Generator

scanTag()

scanTag() : \Generator

Scans for a <tag>-token.

Tag-tokens always have: name, which is the name of the tag

Returns

\Generator

scanClasses()

scanClasses() : \Generator

Scans for a <class>-token (begins with dot (.)).

Class-tokens always have: name, which is the name of the class

Returns

\Generator

scanId()

scanId() : \Generator

Scans for a <id>-token (begins with hash (#)).

ID-tokens always have: name, which is the name of the id

Returns

\Generator

scanMixin()

scanMixin() : \Generator

Scans for a mixin definition token (<mixin>).

Mixin-token always have: name, which is the name of the mixin you want to define

Returns

\Generator

scanMixinCall()

scanMixinCall() : \Generator

Scans for a <mixinCall>-token (begins with plus (+)).

Mixin-Call-Tokens always have: name, which is the name of the called mixin

Returns

\Generator

scanAssignment()

scanAssignment() : \Generator

Scans for an <assignment>-token (begins with ampersand (&)).

Assignment-Tokens always have: name, which is the name of the assignment

Returns

\Generator

scanAttributes()

scanAttributes() : \Generator

Scans for an attribute-block.

Attribute blocks always consist of the following tokens:

('(') -> Indicates that attributes start here ... (name*=*value*) -> Name and Value are both optional, but one of both needs to be provided Multiple attributes are separated by a Comma (,) (')') -> Required. Indicates the end of the attribute block This function will always yield an -token first, if there's an attribute block Attribute-blocks can be split across multiple lines and don't respect indentation of any kind except for the token After that it will continue to yield -tokens containing > name, which is the name of the attribute (Default: null) > value, which is the value of the attribute (Default: null) > escaped, which indicates that the attribute expression result should be escaped After that it will always require and yield an token If the is not found, this function will throw an exception Between , , and as well as around = and , of the attributes you can utilize as many spaces and new-lines as you like

Throws

\Tale\Jade\Lexer\Exception

Returns

\Generator

throwException()

throwException(string  $message) 

Throws a lexer-exception.

The current line and offset of the exception get automatically appended to the message

Parameters

string $message

A meaningful error message

Throws

\Tale\Jade\Lexer\Exception

strlen()

strlen(string  $string) : integer

mb_* compatible version of PHP's strlen.

(so we don't require mb.func_overload)

Parameters

string $string

the string to get the length of

Returns

integer —

the multi-byte-respecting length of the string

strpos()

strpos(string  $haystack, string  $needle, integer|null  $offset = null) : integer|false

mb_* compatible version of PHP's strpos.

(so we don't require mb.func_overload)

Parameters

string $haystack

the string to search in

string $needle

the string we search for

integer|null $offset

the offset at which we might expect it

Returns

integer|false —

the offset of the string or false, if not found

substr()

substr(string  $string, integer  $start, integer|null  $range = null) : string

mb_* compatible version of PHP's substr.

(so we don't require mb.func_overload)

Parameters

string $string

the string to get a sub-string of

integer $start

the start-index

integer|null $range

the amount of characters we want to get

Returns

string —

the sub-string

substr_count()

substr_count(string  $haystack, string  $needle) : integer

mb_* compatible version of PHP's substr_count.

(so we don't require mb.func_overload)

Parameters

string $haystack

the string we want to count sub-strings in

string $needle

the sub-string we want to count inside $haystack

Returns

integer —

the amount of occurences of $needle in $haystack