1

I want to create search fature for social networking application in such a way that users can search other users by username or name even by inputting part of username or name using elasticsearch.

For example:

input: okma
result: {"username": "alokmahor", "name": "Alok Singh Mahor"} // partial match in username

input: m90
result: {"username": "ram9012", "name": "Ram Singh"} // partial match in username

input: shn
result: {"username": "r2020", "name": "Krishna Kumar"} // partial match with name  

After reading and playing these links I come up with my partial solution which I am not sure if thats the correct way.

I followed
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
How to search for a part of a word with ElasticSearch

My solution is

DELETE my_index

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "username":   { "type": "text", "analyzer": "my_analyzer"  },
      "name":   { "type": "text", "analyzer": "my_analyzer"  } 
    }
  }
}


PUT /my_index/_doc/1
{
  "username": "alokmahor",
  "name": "Alok Singh Mahor"
}

PUT /my_index/_doc/2
{
  "username": "ram9012",
  "name": "Ram Singh"
}

PUT /my_index/_doc/3
{
  "username": "r2020",
  "name": "Krishna Kumar"
}

GET my_index/_search
{
"query": {
    "multi_match": {
      "query": "shn",
      "analyzer": "my_analyzer",
      "fields": ["username", "name"]
    }
  }
}

somehow this solution is partailly working and I am not sure if this is really a correct way as I got this after playing aorund elasticsearch features and copy pasting example code. So please suggest correct way or improvement on this.

Things which are not working

// "sin" is not matching with "Singh" but "Sin" is matching and working.
GET my_index/_search
{
"query": {
    "multi_match": {
      "query": "sin",
      "analyzer": "my_analyzer",
      "fields": ["username", "name"]
    }
  }
}
Alok
  • 7,734
  • 8
  • 55
  • 100
  • Just curious: have you improved the index creation? Username searching works with this setup, I am wondering if you improved it more – J. Doe Dec 28 '20 at 17:19

1 Answers1

1

So please suggest correct way

The degree of correctness can only be defined by your requirement. You can keep on refining by checking all the possible use cases one by one.

improvement on this

For the problem you mention where Sin is matching while sin is not; this is because the analyzer defined doesn't make the search case-insensitive. To do so add lowercase filter in your analyzer definition as below:

  "analyzer": {
    "my_analyzer": {
      "tokenizer": "my_tokenizer",
      "filter": [
        "lowercase"
      ]
    }
  }

This answer can help you understand more in case-insensitive search.

Nishant
  • 7,504
  • 1
  • 21
  • 34