2

I have documents in the elasticsearch and can't understand how to apply search script that should return documents if any attachment doesn't contain uuid or uuid is null. Version of elastic 5.2. Mapping of documents

"mappings": {
    "documentType": {
        "properties": {
            "attachment": {
                "properties": {
                    "uuid": {
                        "type": "text"
                    },
                    "path": {
                        "type": "text"
                    },
                    "size": {
                        "type": "long"
                    }
                }
            }}}

In the elasticsearch it looks like

{
        "_index": "documents",
        "_type": "documentType",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "21321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "2",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "223645641321321",
                "path": "../uploads/somepath",
                "size":1231
               },
               {
                "uuid": "22341424321321",
                "path": "../uploads/somepath",
                "size":1231
               },        
         ]},
{
        "_index": "documents",
        "_type": "documentType",
        "_id": "3",
        "_score": 1.0,
        "_source": {
          "attachment": [
               {
                "uuid": "22789789341321321",
                "path": "../uploads/somepath",
                "size":1231
               }, 
               {
                "path": "../uploads/somepath",
                "size":1231
               },      
         ]}

As result I want to get attachments with _id 1 and 3. But as result I get error of the script I tried to apply next script:

{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}

Error is next:

 "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:77)",
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:36)",
                    "for (item in doc['attachment'].value) { ",
                    "                 ^---- HERE"
                ],
                "script": "for (item in doc['attachment'].value) { if (item['uuid'] == null) { return true}}",
                "lang": "painless"
            }
        ],

Is it possible to select documents in case even one attachment object doesn't contain uuid ?

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68

2 Answers2

2

Iterating arrays of objects is not as trivial as one would expect. I've written extensively about it here and here.

Since your attachments are not defined as nested, ES will internally represent them as flattened lists of values (also called "doc values"). For instance attachment.uuid in doc#2 will become ["223645641321321", "22341424321321"], and attachments.size will turn into [1231, 1231].

This means that you can simply compare the .length of these flattened representations! I assume attachment.size will always be present and can be thus taken as the comparison baseline.

One more thing. To take advantage of these optimized doc values for textual fields, it'll require one small mapping change:

PUT documents/documentType/_mappings
{
  "properties": {
    "attachment": {
      "properties": {
        "uuid": {
          "type": "text",
          "fielddata": true     <---
        },
        "path": {
          "type": "text"
        },
        "size": {
          "type": "long"
        }
      }
    }
  }
}

When that's done and you've reindexed your docs — which can be done with this little Update by query trick:

POST documents/_update_by_query

You can then use the following script query:

POST documents/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "attachment"
          }
        },
        {
          "script": {
            "script": {
              "inline": "def size_field_length = doc['attachment.size'].length; def uuid_field_length =  doc['attachment.uuid'].length; return uuid_field_length < size_field_length",
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • thank you for your response. I will check it and looks for future I will make reindex for attachment that change type on nested. – Stepan Poperechnyi Apr 14 '21 at 09:43
  • You're welcome. Making the attachments nested is a good approach. Just keep in mind that scripting with `nested` fields is a chapter of its own. The links at the top of my answer should point you to the right direction, though! – Joe - GMapsBook.com Apr 14 '21 at 10:11
  • 1
    It works. I have reindexed and run your script and it works. Thank you a lot. – Stepan Poperechnyi Apr 14 '21 at 14:09
1

Just to supplement this answer. If mapping for uuid field was created automatically elastic search adds it in this way:

"uuid": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
}

then script could look like:

POST documents/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "attachment"
                    }
                },
                {
                    "script": {
                        "script": {
                            "inline": "doc['attachment.size'].length > doc['attachment.uuid.keyword'].length",
                            "lang": "painless"
                        }
                    }
                }
            ]
        }
    }
}