3

I have a document with given structure:

{ "name" : "WF1", "myIndex" : [3, 4, 5] }

Lets say I have 4 of such records -

{ "name" : "WF1", "myIndex" : [3, 4, 5] }
{ "name" : "WF2", "myIndex" : [6, 7, 8] }
{ "name" : "WF3", "myIndex" : [9, 10, 11] }
{ "name" : "WF4", "myIndex" : [3, 6, 9] }

If I fire below "term" query:

GET myIndex/_search
{
  "query": {
    "terms": {
      "qualsIndex": [
        3, 6, 9, 20
      ]
    }
  }
}

It returns all 4 records. Whereas I only want to return a record that has 3,6, 9 i.e. only WF4. Basically, I want a result document that has a subset of input passed.

Note: I can tweak my document structure to achieve this. is it possible in OpenSearch?

Paulo
  • 8,690
  • 5
  • 20
  • 34
Mohammad Adnan
  • 6,527
  • 6
  • 29
  • 47
  • Does work number are always sorted ? Maybe you could "hack" a solution by turning the array into a string and do some matching on them considering them as words instead of array of numbers ? I am not aware of an out of the box solution in Elasticsearch nor in Opensearch – Paulo Apr 25 '22 at 19:12
  • ALL of your documents contain a subset of your query, (9 is a subset of 3,6,9,20), so what is it that you really want? Is it that you want all of the elements in the document to be in the query? – ThoughtfulHacking Apr 25 '22 at 21:21
  • @ThoughtfulHacking I meant if all of the element should be part of queried array so in document containing 3,4,5 only 3 is part of queried array and not 4 and 5 hence that shouldn't be returned. – Mohammad Adnan Apr 26 '22 at 20:15
  • @Paulo we can sort it or change them as words. I responded directly to your posted answer. – Mohammad Adnan Apr 26 '22 at 20:16

2 Answers2

2

Tldr;

You can achieve this with terms set query.

To understand

Example mapping:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "myIndex": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}

Example query:

{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": [3, 6, 9, 20],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

In your case, required_matches should be index as number of items of myIndex array.

To reproduce

Here is a fully working example:

First of all the setup:

DELETE /72004393/

PUT /72004393/
{
  "mappings": {
    "properties": {
      "myIndex": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

POST _bulk
{"index":{"_index":"72004393"}}
{"name":"WF1","myIndex":["3","4","5"]}
{"index":{"_index":"72004393"}}
{"name":"WF2","myIndex":["6","7","8"]}
{"index":{"_index":"72004393"}}
{"name":"WF3","myIndex":["9","10","11"]}
{"index":{"_index":"72004393"}}
{"name":"WF4","myIndex":["3","6","9"]}

The query:

GET /72004393/_search
{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": [ "3", "6", "9", "20" ],
        "minimum_should_match_script": {
          "source": "3"
        }
      }
    }
  }
}

Gives:

{
  ...
  "hits" : {
  ...
    "max_score" : 2.859232,
    "hits" : [
      {
        "_index" : "72004393",
        "_id" : "LqOQZ4ABOgujegeQ2gfV",
        "_score" : 2.859232,
        "_source" : {
          "name" : "WF4",
          "myIndex" : ["3","6","9"]
        }
      }
    ]
  }
}

Another example:

PUT test
{
  "mappings": {
    "properties": {
      "myIndex": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}
POST _bulk
{"index":{"_index":"test"}}
{"name":"WF1","myIndex":["3","4","5"], "required_matches": 3} 
{"index":{"_index":"test"}}
{"name":"WF2","myIndex":["6","7","8"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF3","myIndex":["9","10","11"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF4","myIndex":["3","6","9"], "required_matches": 3}
{"index":{"_index":"test"}}
{"name":"WF5","myIndex":["3","6","9", "15", "20"], "required_matches": 5}
{"index":{"_index":"test"}}
{"name":"WF6","myIndex":["3","6","9", "15"], "required_matches": 4}

Query:

GET test/_search
{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": ["3", "6", "9"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

The query above will match only WF4 because its required_match is 3 and WF5 AND WF6 required matches are 4 and 5.

If you update terms in the query to ["3", "6", "9", "15"] it will match both WF4 AND WF5 and if you update it to ["3", "6", "9", "15", "20"] it will match to WF4, WF5, WF6.

YD9
  • 525
  • 3
  • 4
  • Kudos to you @YD9. Learned something new ! – Paulo Apr 26 '22 at 20:46
  • @YD9 The problem of this approach is a number of items of myIndex may vary. It may be 3, it may 2 or it may be 4. So I can not hard code that it has to have at minimum 3 items. – Mohammad Adnan Apr 26 '22 at 21:33
  • 1
    @MohammadAdnan you should use `minimum_should_match_field ` and its value should be a numeric field's name that exist in your document and mapping. You know how many items `myIndex` array has and you should index it as separate field(`required_matches` for my example) with value myIndex.length(). It can be any value according to length of the array. And when querying, you should not send 3 as hard coded value. You should use `"minimum_should_match_field": "required_matches"` in the query so that number of required matches can change according to the number of `myIndex` array. – YD9 Apr 27 '22 at 05:42
  • I updated the answer for better understanding – YD9 Apr 27 '22 at 06:03
0

TLDR;

To the best of my knowledge there is no solution in both ElasticSearch and OpenSearch. But I think you can hack you way through it, using number as words

The Hack

Indexing the document with the field myIndex as a string of numbers. I can later search for those numbers, using the match query and the parameter such as minimum_should_match.

DELETE 72004393

POST _bulk
{"index":{"_index":"72004393"}}
{"name":"WF1","myIndex":"3 4 5"}
{"index":{"_index":"72004393"}}
{"name":"WF2","myIndex":"6 7 8"}
{"index":{"_index":"72004393"}}
{"name":"WF3","myIndex":"9 10 11"}
{"index":{"_index":"72004393"}}
{"name":"WF4","myIndex":"3 6 9"}


GET /72004393/_search
{
  "query": {
    "match": {
      "myIndex": {
        "query": "3 6 9 20",
        "minimum_should_match": 3
        }
    }
  }
}

Will give you something like that:

{
  ...
  "hits" : {
    ...
    "max_score" : 2.0794413,
    "hits" : [
      {
        "_index" : "72004393",
        "_id" : "xaMuYoABOgujegeQJgZr",
        "_score" : 2.0794413,
        "_source" : {
          "name" : "WF4",
          "myIndex" : "3 6 9"
        }
      }
    ]
  }
}

This is not perfect and may lead to some edge cases, but this is the closest I could get to a "solution".

Paulo
  • 8,690
  • 5
  • 20
  • 34
  • The problem is, in the query, I could be having a very large set, and I am looking for all document that contains the set of it. So if there is a document that has only 3 in the array then that should also be returned as a response. – Mohammad Adnan Apr 26 '22 at 20:14
  • @MohammadAdnan, I believe Terms_set is the way to go, see the other answer. I'll delete mine. – Paulo Apr 26 '22 at 20:36