abstract class CLTK::Scanner

Overview

A Lexer Class based on crystal-dfa the Crystal RegExp implementation of a Thompson NFA based DFA. Runs faster than (PCRE based) Regex implementations. Define a Lexer like:

class CalcLexer < Scanner
  # set a delimiter to strip the
  # string before lexing for increased
  # performance. defaults to "\n" which
  # is fine for this example.

  # self.pre_delimiter=nil # no string splitting
  # before lexing

  # ignore space & newline
  rule(/[\n\s]/)

  # operators are keywords, so we use
  # stringrules to have one single
  # dfa matching
  rule("+")                { {:PLS} }
  rule("-")                { {:SUB} }
  rule("*")                { {:MUL} }
  rule("/")                { {:DIV} }

  # ints and floats need to be matched with
  # regular expressions (doesn't use Regex but
  # DFA::RegExp)
  rule(/\d+\.\d+/)         { |s| {:FLOAT, s} }
  rule(/\d+/)              { |s| {:INT,   s} }

  # upon sighting of a '#' (and optionally trailing ' ')
  # we go into :comment state and don't leave until we
  # find a '\n'
  rule(/#\s*/)             {     push_state(:comment) }
  rule(/[^\n]+/, :comment) { |t| {:COMMENT, t}        }
  rule("\n", :comment)     {     pop_state            }

  # calculate the dfa's for the string-rules
  # will be called upon first `lex` call if
  # not invoked here
  finalize
end

source = <<-source
#
# a simple calculation
#

4 + 4  # the first addition
- 3.14 # a substraction
* 3

source

pp CalcLexer.lex(source).tokens   # => [{:COMMENT, "a simple calculation"},
                                  #     {:INT, "4"},
                                  #     {:PLS},
                                  #     {:INT, "4"},
                                  #     {:COMMENT,
                                  #      "the first addition"},
                                  #     {:SUB},
                                  #     {:FLOAT,
                                  #      "3.14"},
                                  #     {:COMMENT,
                                  #      "a substraction"},
                                  #     {:MUL},
                                  #     {:INT, "3"}]

Direct Known Subclasses

Defined in:

cltk/scanner.cr

Class Method Summary

Macro Summary

Instance methods inherited from class Object

in?(collection : Array | Set) in?

Class Method Detail

def self.finalize #

finalize the Lexer by creating dfas for the provided string rules for fast keyword matching


[View source]
def self.lex(string : String) : Environment #

lexes a string by continously matching the dfas against the string, yielding the callbacks with an instance of Environment


[View source]
def self.split_lines : Bool #

In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks. this is enabled by default, but can be disabled with this class setter


[View source]
def self.split_lines=(split_lines : Bool) #

In order to speed up lexing, the string might be split in single lines and therefore fed to the dfas in smaller chunks. this is enabled by default, but can be disabled with this class setter


[View source]

Macro Detail

macro rule(expression, state = :default) #

Defines a lexing rule. The expression can either be a string or a DFA::RegExp compatible expression. State indicates a Lexer State in which this Rule should be applied. String Expressions for the same state get combined in one alternating (..|..|..) DFA for faster recognition. String Expressions should be used for special keywords and symbols like: def, true or ":"


[View source]