abstract class CLTK::Scanner
- CLTK::Scanner
- Reference
- Object
Overview
A Lexer Class based on crystal-dfa the Crystal RegExp implementation of a Thompson NFA based DFA. Runs faster than (PCRE based) Regex implementations. Define a Lexer like:
class CalcLexer < Scanner
# set a delimiter to strip the
# string before lexing for increased
# performance. defaults to "\n" which
# is fine for this example.
# self.pre_delimiter=nil # no string splitting
# before lexing
# ignore space & newline
rule(/[\n\s]/)
# operators are keywords, so we use
# stringrules to have one single
# dfa matching
rule("+") { {:PLS} }
rule("-") { {:SUB} }
rule("*") { {:MUL} }
rule("/") { {:DIV} }
# ints and floats need to be matched with
# regular expressions (doesn't use Regex but
# DFA::RegExp)
rule(/\d+\.\d+/) { |s| {:FLOAT, s} }
rule(/\d+/) { |s| {:INT, s} }
# upon sighting of a '#' (and optionally trailing ' ')
# we go into :comment state and don't leave until we
# find a '\n'
rule(/#\s*/) { push_state(:comment) }
rule(/[^\n]+/, :comment) { |t| {:COMMENT, t} }
rule("\n", :comment) { pop_state }
# calculate the dfa's for the string-rules
# will be called upon first `lex` call if
# not invoked here
finalize
end
source = <<-source
#
# a simple calculation
#
4 + 4 # the first addition
- 3.14 # a substraction
* 3
source
pp CalcLexer.lex(source).tokens # => [{:COMMENT, "a simple calculation"},
# {:INT, "4"},
# {:PLS},
# {:INT, "4"},
# {:COMMENT,
# "the first addition"},
# {:SUB},
# {:FLOAT,
# "3.14"},
# {:COMMENT,
# "a substraction"},
# {:MUL},
# {:INT, "3"}]
Direct Known Subclasses
Defined in:
cltk/scanner.crClass Method Summary
-
.finalize
finalize the Lexer by creating dfas for the provided string rules for fast keyword matching
-
.lex(string : String) : Environment
lexes a string by continously matching the dfas against the string, yielding the callbacks with an instance of Environment
-
.split_lines : Bool
In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks.
-
.split_lines=(split_lines : Bool)
In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks.
Macro Summary
-
rule(expression, state = :default)
Defines a lexing rule.
Instance methods inherited from class Object
in?(collection : Array | Set)
in?
Class Method Detail
finalize the Lexer by creating dfas for the provided string rules for fast keyword matching
lexes a string by continously matching the dfas against the string, yielding the callbacks with an instance of Environment
In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks. this is enabled by default, but can be disabled with this class setter
In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks. this is enabled by default, but can be disabled with this class setter
Macro Detail
Defines a lexing rule. The expression can either
be a string or a DFA::RegExp
compatible expression.
State indicates a Lexer State in which this Rule should
be applied. String Expressions for the same state get
combined in one alternating (..|..|..) DFA for faster
recognition. String Expressions should be used for
special keywords and symbols like: def, true or ":"