0

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.

Below is my example of Mapping used.

$myTypeMapping = array(
    '_source' => array(
        'enabled' => true
    ),
    'properties' => array(
        'id'    => array(
            'type'  => 'integer',
            'index' => 'not_analyzed'
        ),
        'name' => array(
            'type' => 'string',
            'analyzer' => 'snowball',
            'boost' => 2.0
        ),
        'food_types' => array(
            'type' => 'string',
            'analyzer' => 'keyword'
        ),
        'location' => array(
            'type' => 'geo_point',
            "geohash_precision"=> 4
        ),
        'city' => array(
            'type' => 'string',
            'analyzer' => 'keyword'
        )
    )
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;

// Create the index

$elastic_client->indices()->create($indexParams);

On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,

    {
  "online_pizza": {
    "properties": {
      "city": {
        "type": "string",
        "analyzer": "keyword"
      },
      "food_types": {
        "type": "string",
        "analyzer": "keyword"
      },
      "id": {
        "type": "integer"
      },
      "location": {
        "type": "geo_point",
        "geohash_precision": 4
      },
      "name": {
        "type": "string",
        "boost": 2,
        "analyzer": "snowball"
      }
    }
  }
}

My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.

 {
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "Milan"
     }
   }
 }

I've also tried to snowball analyzer during querying, no help.

{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "Milan",
      "analyzer": "snowball"
    }
  }
}

Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?

Thanks,

Ronak Jain
  • 1,723
  • 2
  • 24
  • 35

1 Answers1

1

The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.

However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.

If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :

GET _analyze?analyzer=snowball&text=Milano

Output :

{
   "tokens": [
      {
         "token": "milano",
         "start_offset": 0,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

Then, using same snowball analyzer on Mil like this :

GET _analyze?analyzer=snowball&text=Mil

gives you this token :

{
   "tokens": [
      {
         "token": "mil",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.

For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :

POST index_name
{
  "analysis": {
   "analyzer": {
     "case_insensitive_keyword": {
       "type": "custom",
       "tokenizer": "keyword",
       "filter": ["lowercase"]
     }
   }
  }
}

Test :

GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo

Output :

{
   "tokens": [
      {
         "token": "choo choo",
         "start_offset": 0,
         "end_offset": 9,
         "type": "word",
         "position": 1
      }
   ]
}

I hope I'm clear enough in my explainations :)

ThomasC
  • 7,915
  • 2
  • 26
  • 26
  • Thankyou for the appropriate answer, Very clear, What should be the analyser used, if I want it to search any character but not using the stemmer rule. Regular Expression ? – Ronak Jain Oct 30 '14 at 09:52
  • Maybe [ngrams](http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html) would be the right answer – ThomasC Oct 30 '14 at 09:58
  • Really appreciate you taking time to answer the question which such clarity. thank you so much – Ronak Jain Oct 30 '14 at 10:08