2

In SQL, I can search email addresses pretty well with SQL LIKE.

With an email "stack@domain.com", searching "stack", "@domain.com", "domain.com", or "domain" would get me back the desired email address.

How can I get the same result with ElasticSearch?

I played with nGram, edgeNGram, uax_url_email, etc and the search results have been pretty bad. Please correct me if I'm wrong, it sounds like I have to do the following:

  1. for index_analyzer
    • use "keyword", "whitespace", or "uax_url_email" tokenizer so the email don't get tokenized
      • but wildcard queries don't seem to work (with tire at least)
    • use "nGram" or "edgeNGram" for filter
      • I always get way too many unwanted results like getting "first@domain.com" when searching "first-second".
  2. for search_analyzer
    • don't do nGram

One experiment code

tire.settings :number_of_shards => 1,
            :number_of_replicas => 1,
            :analysis => {
                :filter => {
                    :db_ngram  => {
                        "type"     => "nGram",
                        "max_gram" => 255,
                        "min_gram" => 3 }
                },
                :analyzer => {
                    :string_analyzer => {
                        "tokenizer"    => "standard",
                        "filter"       => ["standard", "lowercase", "asciifolding", "db_ngram"],
                        "type"         => "custom" },
                    :index_name_analyzer => {
                        "tokenizer"    => "standard",
                        "filter"       => ["standard", "lowercase", "asciifolding"],
                        "type"         => "custom" },
                    :search_name_analyzer => {
                        "tokenizer"    => "whitespace",
                        "filter"       => ["lowercase", "db_ngram"],
                        "type"         => "custom" },
                    :index_email_analyzer => {
                        "tokenizer"    => "whitespace",
                        "filter"       => ["lowercase"],
                        "type"         => "custom" }
                }
            } do
    mapping do
      indexes :id,           :index    => :not_analyzed
      indexes :name,         :index_analyzer => 'index_name_analyzer', :search_analyzer => 'search_name_analyzer'
      indexes :email,        :index_analyzer => 'index_email_analyzer', :search_analyzer => 'search_email_analyzer'
    end
end

Specific cases that don't work well:

  • emails with hyphen (eg. email-hyphen@domain.com)
  • query string '@' at the beginning or end
  • exact matches
  • searching with wildcard like '@' gets very unexpected results.

Suppose I have, "aaa@email.com", "aaa_0@email.com", and "aaa-0@email.com, searching "aaa" gives me "aaa@a.com" "aaa-0@email.com. Searching "aaa*" give me everything, but "aaa-*" gives me nothing. So, how should I do exact match wildcard queries? For these type of queries, I get pretty much the same results for different tokenizer/analyzer.

I do these after each mapping change: Model.tire.index.delete Model.tire.create_elasticsearch_index Model.tire.index.import Model.all

References:

Community
  • 1
  • 1
Gary L
  • 91
  • 6

1 Answers1

0

Considering what you are trying to accomplish, KeywordAnalyzer might be a reasonable choice of analyzer, though I don't see anything that would cause problems with a WhitespaceAnalyzer.

I suspect you are running into problems with the query parsing and analysis, although you haven't really described how you are querying. Simplest case would be to simply use term or prefix queries.

It does seem a bit like StandardAnalyzer would serve your purpose here, mostly (differentiating between "aaa_0" and "aaa-0" would be a problem), as long as it is applied consistently, and your query is correct.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Thanks. Actually, the query is straightforward, only one string/term. Model.tire.search(:load => true, :per_page => 25) do query { string "*com" } end I tried prefix, I suppose "@.gmail.com" won't work. I could not figure out how to do wildcard queries with the tire gem yet, otherwise I would chain exact query with wildcard queries to search emails different way. boolean :minimum_number_should_match => 1 do should { prefix 'email', term } should { prefix 'email.exact', term } should { string term } end – Gary L Aug 28 '13 at 23:56