Module | ScopedSearch::QueryLanguage::Tokenizer |
In: |
lib/scoped_search/query_language/tokenizer.rb
|
The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.
KEYWORDS | = | { 'and' => :and, 'or' => :or, 'not' => :not, 'set?' => :notnull, 'has' => :notnull, 'null?' => :null, 'before' => :lt, 'after' => :gt, 'at' => :eq } | All keywords that the language supports | |
OPERATORS | = | { '&' => :and, '|' => :or, '&&' => :and, '||' => :or, '-'=> :not, '!' => :not, '~' => :like, '!~' => :unlike, '=' => :eq, '==' => :eq, '!=' => :ne, '<>' => :ne, '>' => :gt, '<' => :lt, '>=' => :gte, '<=' => :lte, '^' => :in, '!^' => :notin } | Every operator the language supports. |
Returns the current character of the string
# File lib/scoped_search/query_language/tokenizer.rb, line 19 19: def current_char 20: @current_char 21: end
Tokenizes the string by iterating over the characters.
# File lib/scoped_search/query_language/tokenizer.rb, line 37 37: def each_token(&block) 38: while next_char 39: case current_char 40: when /^\s?$/; # ignore 41: when '('; yield(:lparen) 42: when ')'; yield(:rparen) 43: when ','; yield(:comma) 44: when /\&|\||=|<|>|\^|!|~|-/; tokenize_operator(&block) 45: when '"'; tokenize_quoted_keyword(&block) 46: else; tokenize_keyword(&block) 47: end 48: end 49: end
Returns the next character of the string, and moves the position pointer one step forward
# File lib/scoped_search/query_language/tokenizer.rb, line 31 31: def next_char 32: @current_char_pos += 1 33: @current_char = @str[@current_char_pos, 1] 34: end
Returns a following character of the string (by default, the next character), without updating the position pointer.
# File lib/scoped_search/query_language/tokenizer.rb, line 25 25: def peek_char(amount = 1) 26: @str[@current_char_pos + amount, 1] 27: end
Tokenizes the string and returns the result as an array of tokens.
# File lib/scoped_search/query_language/tokenizer.rb, line 13 13: def tokenize 14: @current_char_pos = -1 15: to_a 16: end
Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).
# File lib/scoped_search/query_language/tokenizer.rb, line 60 60: def tokenize_keyword(&block) 61: keyword = current_char 62: keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char 63: KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword) 64: end
Tokenizes an operator that occurs in the OPERATORS hash
# File lib/scoped_search/query_language/tokenizer.rb, line 52 52: def tokenize_operator(&block) 53: operator = current_char 54: operator << next_char if OPERATORS.has_key?(operator + peek_char) 55: yield(OPERATORS[operator]) 56: end
Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.
# File lib/scoped_search/query_language/tokenizer.rb, line 68 68: def tokenize_quoted_keyword(&block) 69: keyword = "" 70: until next_char.nil? || current_char == '"' 71: keyword << (current_char == "\\" ? next_char : current_char) 72: end 73: yield(keyword) 74: end