elasticsearch synonyms & shingle conflict

Question

Let me jump straight to the code.

PUT /test_1
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym": {
          "type": "synonym",
          "synonyms": [
            "university of tokyo => university_of_tokyo, u_tokyo",
            "university" => "college, educational_institute, school"
          ],
          "tokenizer": "whitespace"
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "shingle",
            "synonym"
          ]
        }
      }
    }
  }
}

output

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Token filter [shingle] cannot be used to parse synonyms"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Token filter [shingle] cannot be used to parse synonyms"
  },
  "status": 400
}

Basically,
Lets Say I have following index_time synonyms

"university => university, college, educational_institute, school"
"tokyo => tokyo, japan_capitol"
"university of tokyo => university_of_tokyo, u_tokyo"

If I search for "college" I expect to match "university of tokyo"
but since index contains only "university of tokyo" => university_of_tokyo, u_tokyo.....the search fails

I was expecting if I use analyzer{'filter': ["single", "synonym"]}

university of tokyo -shingle-> university -synonyms-> college, institue

How do I obtain the desired behaviour?

score 0 · Answer 1 · answered Jul 09 '20 at 16:08

I was getting a similar error, though I was using synonym graph....

I tried using lenient=true in the synonym graph definition and got rid of the error. Not sure if there is a downside....

 "graph_synonyms" : {
                        "lenient": "true",       
                        "type" : "synonym_graph",
                        "synonyms_path" : "synonyms.txt"
            },

Ali Asghar Taghizadeh · Answer 2 · 2020-09-30T12:21:08.803

According to this link Tokenizers should produce single tokens before a synonym filter.

But to answer your problem first of all your second rule should be modified to be like this to make all of terms synonyms

university , college, educational_institute, school

Second Because of underline in the tail of first rule (university_of_tokyo) all the occurrences of "university of tokyo" are indexed as university_of_tokyo which is not aware of it's single tokens. To overcome this problem I would suggest a char filter with a rule like this:

university of tokyo => university_of_tokyo university of tokyo

and then in your synonyms rule:

university_of_tokyo , u_tokyo

This a way to handle multi-term synonyms problem as well.

elasticsearch synonyms & shingle conflict

2 Answers2