1

I'm trying to use elasticsearch for a project I'm working on. I was wondering if someone could help steer me in the right direction. I'm using an index with 100+ million records.

I need to be able to search with a wildcard query like the following:

b*g@gmail.com
b*g@*.com
*gus@gmail.com
br*gu*@gmail.com
*g*@*

When I try using Wildcard and other searches, I don't get completely expected results.

What type of search with elasticsearch should I look into implementing? Is ElasticSearch even the right tool to be using? The source I'm pulling this out of is Mysql, so if not I may consider using Sphinx or Solr.

coler-j
  • 1,791
  • 1
  • 27
  • 57
Brett G
  • 349
  • 2
  • 4
  • 20
  • For email searches I suggest this approach: http://stackoverflow.com/questions/30115867/elasticsearch-analyzer-and-tokenizer-for-emails – Andrei Stefan Jul 08 '16 at 07:32

1 Answers1

1

I assume that you have tried out the wildcard query as described here.

However, it has very different behaviour if your email is analyzed versus not analyzed. I would suggest you delete your index and change your mapping. e.g.

PUT /emails
{
    "mappings": {
        "email":  {
            "properties": {
                "email": {
                "type": "string",
                "index": "not_analyzed"
             }
        }
     }
  }
}

Once you have this, you can just do the normal wildcard query or query_string. e.g.

GET emails/_search
{
  "query": {
    "wildcard": {
      "email": {
        "value": "s*com"
      }
    }
  }
}

As an aside, when you just index email without setting it as not_analyzed, the default mapping actually splits up the email prefix from the domain and so that's why you don't get results for when you do s*@gmail.com. You would still get results for s* or *gmail.com but for your case, using not_analyzed works correctly. If you want to support case insensitivity, then you might want to look at a custom analyzer that uses the uax_url_email tokenizer as described here.

Sarwar Bhuiyan
  • 344
  • 1
  • 7