0

This search of duplicate documents for single field is working. The index is test_4. The type is test_4. The field is date.

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "field": "date.keyword",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

This search of duplicate documents for multi fields is not working. The index is test_4. The type is test_4. The fields are date and EventType.

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

This is the error.

curl: (52) Empty reply from server

This search of duplicate documents for multi fields is not working. The index is test_4. The type is test_4. The fields are date and EventType.

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

This is the error.

curl: (52) Empty reply from server

This search of duplicate documents for multi fields is not working. The index is test_4. The type is test_4. The fields are date and EventType.

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "def l = []; l.addAll(doc['"'"'date'"'"']); l.addAll(doc['"'"'EventType'"'"'].values); l",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

This is the error.

curl: (52) Empty reply from server

This search of duplicate documents for multi fields is not working. The index is test_4. The type is test_4. The fields are date and EventType.

curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['date'].values + doc['EventType'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'

This is the error. The error reason is "Variable [date] is not defined".

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "compile error",
        "script_stack" : [
          "doc[date].values + doc[EventT ...",
          "    ^---- HERE"
        ],
        "script" : "doc[date].values + doc[EventType].values",
        "lang" : "painless"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test_4",
        "node" : "dhB-H0_yRROhoP6W-FhOyA",
        "reason" : {
          "type" : "script_exception",
          "reason" : "compile error",
          "script_stack" : [
            "doc[date].values + doc[EventT ...",
            "    ^---- HERE"
          ],
          "script" : "doc[date].values + doc[EventType].values",
          "lang" : "painless",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Variable [date] is not defined."
          }
        }
      }
    ]
  },
  "status" : 500
}

This is one example document.

{
  "_index" : "test_4",
  "_type" : "test_4",
  "_id" : "IMQcWGEBOC31Kjf9gyWS",
  "_score" : 18.249443,
  "_source" : {
    "date" : "18-02-02",
    "path" : "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/nifi-app_2018-02-02_11.0.log",
    "@timestamp" : "2018-02-02T20:01:59.159Z",
    "EventType" : "ERROR",
    "EventText" : "[Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
    "@version" : "1",
    "host" : "hostname",
    "time" : "11:31:36,978",
    "message" : "2018-02-02 11:31:36,978 ERROR [Timer-Driven Process Thread-7] o.a.n.p.a.storage.PutAzureBlobStorage PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288] failed to process due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from PutAzureBlobStorage[id=117f16f0-113c-1fcd-6a48-d9d99d3cd288]: java.io.IOException; rolling back session: {}",
    "type" : "test_4"
  }
},
databeata
  • 1
  • 6

1 Answers1

0

The problem is that your interpreter, most likely bash, is removing the ' from your query. In fact, ES does not receive it:

      "script_stack" : [
        "doc[date].values + doc[EventT ...",
        "    ^---- HERE"
      ],
      "script" : "doc[date].values + doc[EventType].values",

If you try to echo your command you can see that the desired ' was removed:

$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  ...
      "script": "doc['date'].values + doc['EventType'].values",
  ...
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc[date].values + doc[EventType].values",
        "min_doc_count": 2
      },
      ...

You should escape the ' character with a syntax like this '"'"'. Here we close the first ', start new string with ", put the ', then close " and finally open another '. Here is how it will look like altogether:

$ echo curl -XGET 'http://ip:9200/test_4/test_4/_search?pretty=true' -H 'Content-Type: application/json' -d'{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['"'"'date'"'"'].values + doc['"'"'EventType'"'"'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}'
curl -XGET http://ip:9200/test_4/test_4/_search?pretty=true -H Content-Type: application/json -d{
  "size": 0,
  "aggs": {
    "duplicateCount": {
      "terms": {
      "script": "doc['date'].values + doc['EventType'].values",
        "min_doc_count": 2
      },
      "aggs": {
        "duplicateDocuments": {
          "top_hits": {}
        }
      }
    }
  }
}

You can get more information about bash quoting in this SO question, for example.

I understand you want to find duplicates with terms aggregation with values from two fields. The script query you are trying to make won't work because fielddata arrays are immutable. This is the script you might use instead:

"script": "def l = []; l.addAll(doc['date']); l.addAll(doc['EventType'].values); l",
  "min_doc_count": 2
},

You may also consider to use copy_to to copy values from several fields into one and then make regular terms aggregation on just one field (this should perform faster than scripted aggregation).

Hope that helps!

Nikolay Vasiliev
  • 5,656
  • 22
  • 31