3

I have entries in my index like follows:

 ID   BuildingName  Postalcode Type
  1   ABCD             1234     1
  2   ABCD             7890     1

I need to remove duplicates appearing in the 'BuildingName' field at search (not at index since you see they are two different entries) . Finally I only want to see (any of the buildings with the searched name)

ID   BuildingName  Postalcode Type
  1   ABCD             1234     1

why I cannot use field collapsing/aggregation as described here (Remove duplicate documents from a search in Elasticsearch) -> because i need BuildingName to be n-gram analyzed and the field collapsing / aggregation works only on non analyzed fields.

Any way to accomplishing this? All help appreciated! Thanks!

Community
  • 1
  • 1
UD1989
  • 307
  • 1
  • 14

1 Answers1

1

Add a sub-field to BuildingName field which should be not_analyzed or analyzed with an analyzer like keyword which shouldn't change the text much. While you search on the normal BuildingName field that is nGram-ed, the aggregation is performed on the sub-field which is not changed:

  • the mapping:
  "mappings": {
    "test": {
      "properties": {
        "BuildingName": {
          "type": "string",
          "analyzer": "my_ngram_analyzer",
          "fields": {
            "notAnalyzed": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
  • the query:
{
  "query": {
    "term": {
      "BuildingName": {
        "value": "ab"
      }
    }
  },
  "aggs": {
    "unique": {
      "terms": {
        "field": "BuildingName.notAnalyzed",
        "size": 10
      },
      "aggs": {
        "sample": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}
Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • thank you for your answer. Doing what you suggested had no effect on the fields returned....I am wondering I am also suppossed to duplicate the BuildingName in the "notAnalyzed" subfield? since I assume no data is actually being stored now in that subfield... – UD1989 Sep 18 '15 at 08:31
  • What do you mean? I don't understand. Can you update the original post with what you tried, what you got back and why that's a problem? – Andrei Stefan Sep 18 '15 at 08:33
  • I added the subfiled to the BuildingName field and specified it as 'not analyzed' in the mapping. Then I posted the same query as you have written. The search results come back same as before I did all this- i.e. with duplicate BuildingName . – UD1989 Sep 18 '15 at 09:16
  • Please, post the updated mapping and the result you get back. – Andrei Stefan Sep 18 '15 at 09:19