8

I'm using painless to filter documents with Elastic 5.5

Problem

Using "painless", find documents with strings field.

Expected Results

Only documents with strings field are returned

Actual Results

All documents are returned.

Observation

All documents are returned, as long as there's a document with strings field. This could be a caching issue of some sort.

TestCase

Fixtures

PUT /test_idx

POST /test_idx/t/1
{
      "strings": ["hello", "world"]
}

POST /test_idx/t/2
{
      "numbers": [1, 2, 3]
}

Query

GET /test_idx/_search
{
   "query": {
      "bool": {
         "filter": [
            {
               "script": {
                  "script": {
                     "lang": "painless",
                     "inline": "return doc.containsKey(params.keypath)",
                     "params": {"keypath": "strings"}
                  }
               }
            }
         ]
      }
   }
}

Actual Response

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": [
         {
            "_index": "test_idx",
            "_type": "t",
            "_id": "2",
            "_score": 0,
            "_source": {
               "numbers": [
                  1,
                  2,
                  3
               ]
            }
         },
         {
            "_index": "test_idx",
            "_type": "t",
            "_id": "1",
            "_score": 0,
            "_source": {
               "strings": [
                  "hello",
                  "world"
               ]
            }
         }
      ]
   }
}

Expected Response

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": [
         {
            "_index": "test_idx",
            "_type": "t",
            "_id": "1",
            "_score": 0,
            "_source": {
               "strings": [
                  "hello",
                  "world"
               ]
            }
         }
      ]
   }
}
Justin Wrobel
  • 1,981
  • 2
  • 23
  • 36
Maryan
  • 1,484
  • 1
  • 12
  • 16

2 Answers2

1

You might want to try this, even though it is strongly discouraged to overuse painless for performance reasons

GET /test_idx/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "script": {
            "script": {
              "lang": "painless",
              "inline": "return doc[params.keypath].value != null",
              "params": {
                "keypath": "strings.keyword"
              }
            }
          }
        }
      ]
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks! this works, yet `null` could be a valid field value. which may lead to incorrect results… Yes, we're observing the performance, so far it's acceptable… – Maryan Nov 24 '18 at 05:16
  • when a field has a null value it is not at all indexed. – Val Nov 24 '18 at 05:17
0

Why do you require painless to do so? This can be easily done by exists query

{
  "query": {
    "exists": {
      "field": "strings"
    }
  }
}
Nishant
  • 7,504
  • 1
  • 21
  • 34
  • Using painless to do more complex checks, ie: field `strings` exactly equals ["hello", "world"]. Yes, `exists` works, but still wondering why isn't the `containsKey` functioning... – Maryan Nov 24 '18 at 03:35
  • If you have something complex I would suggest you to change the way you are storing data. For example instead of storing values as `array`, consider `nested` object. Try to avoid the use of script as relying more and more on script will degrade the performance. – Nishant Nov 24 '18 at 04:52
  • We're benchmarking and will see whether it's acceptable. Also don't see why painless can't be compiled into the same underlying AST JSON pseudo AST gets compiled to. – Maryan Nov 24 '18 at 05:20
  • 1
    @Maryan the problem when using scripting is that the script (painless = java) has to be executed against each document and can't leverage all the benefits that the inverted index provides. – Val Nov 24 '18 at 07:25