2

I have a file of alternate spellings for the terms in my index. I want to produce bigrams containing those alternate spellings for particular terms. For example, I have biriyani, biryani, briyani in my alternate spellings csv file and my field contains the text Chicken Biryani. I want to be able to produce chicken biryani, chicken biriyani, chicken briyani tokens.

Now, if I use a whitespace tokenizer with a synonym filter, the following tokens are generated chicken, biriyani, biryani, briyani which is expected. Now if I apply a shingle filter then, the tokens generated are chicken, chicken biryani, biryani, biryani biriyani, biriyani, biriyani briyani, briyani. This token stream contains shingles of synonyms of the word itself which should not be there and it does not contain tokens with chicken [alternate spellings of biryani] like chicken biriyani or chicken briyani, etc. If I place shingle filter before the synonym filter, then it only adds synonym tokens for the unigram: chicken, chicken biryani, biriyani, biryani, briyani. Is there a way to generate tokens that contain synonyms at the same position as the original token, or in this case chicken biryani, chicken biriyani, chicken briyani

Sample settings for testing:

PUT test_bigram
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "biriyani, biryani, briyani"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "filter": [
              "synonym"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          },
          "shingle_synonym": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "shingle",
              "synonym"
            ]
          },
          "synonym_shingle": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "synonym",
              "shingle"
            ]
          }
        }
      }
    }
  }
}

I am running Elasticsearch 5.6

0 Answers0