11

I'm wondering what the best way to parse a text query in Rails is, to allow the user to include logical operators?

I'd like the user to be able to enter either of these, or some equivalent:

# searching partial text in emails, just for example
# query A
"jon AND gmail" #=> ["jonsmith@gmail.com"]

# query B
"jon OR gmail" #=> ["jonsmith@gmail.com", "sarahcalaway@gmail.com"]

# query C
"jon AND gmail AND smith" #=> ["jonsmith@gmail.com"]

Ideally, we could get even more complex with parentheses to indicate order of operations, but that's not a requirement.

Is there a gem or a pattern that supports this?

potashin
  • 44,205
  • 11
  • 83
  • 107
steel
  • 11,883
  • 7
  • 72
  • 109
  • [Maybe be of help, title may confuse you, but take a look at the second and third options](http://stackoverflow.com/questions/31096009/hash-notation-for-activerecord-or-query/31096106#31096106) – potashin Jul 13 '15 at 20:05
  • When you say "allow the user to include logical operators", do you mean allow the end user to supply operators via some web interface? Or do you mean how do you do it with ActiveRecord's API? – Ryenski Jul 13 '15 at 20:16
  • That they would type it in via an input, as demonstrated above. – steel Jul 13 '15 at 20:59
  • 1
    @suslov, smart_tuple looks like an interesting possibility. Thank you for that link. – steel Jul 13 '15 at 21:00
  • http://meta.stackexchange.com/questions/16065/how-does-the-bounty-system-work Check the "How do I award a bounty?" topic. :) – mrodrigues Jul 22 '15 at 15:58
  • @suslov You might need to award the bounty. I don't see the icon. – steel Jul 22 '15 at 16:01
  • I mean, what answer do you prefer. – potashin Jul 22 '15 at 21:46

3 Answers3

7

This is a possible but inefficient way to do this:

user_input = "jon myers AND gmail AND smith OR goldberg OR MOORE"
terms = user_input.split(/(.+?)((?: and | or ))/i).reject(&:empty?)
# => ["jon myers", " AND ", "gmail", " AND ", "smith", " OR ", "goldberg", " OR ", "MOORE"]

pairs = terms.each_slice(2).map { |text, op| ["column LIKE ? #{op} ", "%#{text}%"] }
# => [["column LIKE ?  AND  ", "%jon myers%"], ["column LIKE ?  AND  ", "%gmail%"], ["column LIKE ?  OR  ", "%smith%"], ["column LIKE ?  OR  ", "%goldberg%"], ["column LIKE ?  ", "%MOORE%"]]

query = pairs.reduce([""]) { |acc, terms| acc[0] += terms[0]; acc << terms[1] }
# => ["column LIKE ?  AND  column LIKE ?  AND  column LIKE ?  OR column LIKE ?  OR  column LIKE ?  ", "%jon myers%", "%gmail%", "%smith%", "%goldberg%", "%MOORE%"]

Model.where(query[0], *query[1..-1]).to_sql
# => SELECT "courses".* FROM "courses"  WHERE (column LIKE '%jon myers%'  AND  column LIKE '%gmail%'  AND  column LIKE '%smith%'  OR  column LIKE '%goldberg%'  OR  column LIKE '%MOORE%'  )

However, as I said, searches like this one are extremely inefficient. I'd recommend you use a full-text search engine, like Elasticsearch.

mrodrigues
  • 1,052
  • 10
  • 14
4

I use such a parser in a Sinatra app, since the queries tend to be complex I produce plain SQL instead of using the activerecords selection methods. If you can use it, feel free..

You use it like this, class_name is the activerecord class representing the table, params is a hash of strings to parse, the result is sent to the browser as Json eg

generic_data_getter (Person, {age: ">30",name: "=John", date: ">=1/1/2014 <1/1/2015"})

  def generic_data_getter (class_name, params, start=0, limit=300, sort='id', dir='ASC')
    selection = build_selection(class_name, params)
    data = class_name.where(selection).offset(start).limit(limit).order("#{sort} #{dir}")
    {:success => true, :totalCount => data.except(:offset, :limit, :order).count, :result => data.as_json}
  end

def build_selection class_name, params
  field_names = class_name.column_names
  selection = []
  params.each do |k,v|
    if field_names.include? k
      type_of_field = class_name.columns_hash[k].type.to_s
      case
      when (['leeg','empty','nil','null'].include? v.downcase) then selection << "#{k} is null"
      when (['niet leeg','not empty','!nil','not null'].include? v.downcase) then selection << "#{k} is not null"
      when type_of_field == 'string' then 
        selection << string_selector(k, v)
      when type_of_field == 'integer' then
        selection << integer_selector(k, v)
      when type_of_field == 'date' then
        selection << date_selector(k, v)
      end
    end
  end
  selection.join(' and ')
end

def string_selector(k, v)
  case
  when v[/\|/]
    v.scan(/([^\|]+)(\|)([^\|]+)/).map {|p| "lower(#{k}) LIKE '%#{p.first.downcase}%' or lower(#{k}) LIKE '%#{p.last.downcase}%'"}
  when v[/[<>=]/]
    v.scan(/(<=?|>=?|=)([^<>=]+)/).map { |part| "#{k} #{part.first} '#{part.last.strip}'"}
  else
    "lower(#{k}) LIKE '%#{v.downcase}%'"
  end
end

def integer_selector(k, v)
  case
  when v[/\||,/]
    v.scan(/([^\|]+)([\|,])([^\|]+)/).map {|p|p p; "#{k} IN (#{p.first}, #{p.last})"}
  when v[/\-/]
    v.scan(/([^-]+)([\-])([^-]+)/).map {|p|p p; "#{k} BETWEEN #{p.first} and #{p.last}"}
  when v[/[<>=]/]
    v.scan(/(<=?|>=?|=)([^<>=]+)/).map { |part| p part; "#{k} #{part.first} #{part.last}"}
  else
    "#{k} = #{v}"
  end
end

def date_selector(k, v)
  eurodate = /^(\d{1,2})[-\/](\d{1,2})[-\/](\d{1,4})$/
  case
  when v[/\|/]
    v.scan(/([^\|]+)([\|])([^\|]+)/).map {|p|p p; "#{k} IN (DATE('#{p.first.gsub(eurodate,'\3-\2-\1')}'), DATE('#{p.last.gsub(eurodate,'\3-\2-\1')}'))"}
  when v[/\-/]
    v.scan(/([^-]+)([\-])([^-]+)/).map {|p|p p; "#{k} BETWEEN DATE('#{p.first.gsub(eurodate,'\3-\2-\1')}')' and DATE('#{p.last.gsub(eurodate,'\3-\2-\1')}')"}
  when v[/<|>|=/]
    parts = v.scan(/(<=?|>=?|=)(\d{1,2}[\/-]\d{1,2}[\/-]\d{2,4})/)
    selection = parts.map do |part|
      operator = part.first ||= "="
      date = Date.parse(part.last.gsub(eurodate,'\3-\2-\1'))
      "#{k} #{operator} DATE('#{date}')"
    end
  when v[/^(\d{1,2})[-\/](\d{1,4})$/]
    "#{k} >= DATE('#{$2}-#{$1}-01') and #{k} <= DATE('#{$2}-#{$1}-31')"
  else
    date = Date.parse(v.gsub(eurodate,'\3-\2-\1'))
    "#{k} = DATE('#{date}')"
  end
end
peter
  • 41,770
  • 5
  • 64
  • 108
3

The simplest case would be extract an array from the strings:

and_array = "jon AND gmail".split("AND").map{|e| e.strip}
# ["jon", "gmail"]
or_array = "jon OR sarah".split("OR").map{|e| e.strip}
# ["jon", "sarah"]

Then you could construct an query string:

query_string = ""
and_array.each {|e| query_string += "%e%"}
# "%jon%%gmail%"

Then you use a ilike or a like query to fetch the results:

Model.where("column ILIKE ?", query_string)
# SELECT * FROM model WHERE column ILIKE '%jon%%gmail%'
# Results: jonsmith@gmail.com

Of course that could be a little overkill. But it is a simple solution.

MurifoX
  • 14,991
  • 3
  • 36
  • 60