4

Assume that we have this index in OpenSearch:

 {
    "settings": {
        "index.knn": True,
        "number_of_replicas": 0,
        "number_of_shards": 1,
    },
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "tag": {"type": "text"},
            "e1": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e2": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e3": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
        }
    },
}

And we want to perform a search over all the fields (approximate knn for the vector fields). What would be the correct way to do this in OpenSearch?

I have this query that works but I'm not sure if it is the correct way of doing this and if it uses approximate knn:

{
    "size": 10,
    "query": {
        "bool": {
            "should": [
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e1": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e2": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e3": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "match": {"title": "title"}
                        },
                        "weight": 0.1,
                    }
                },
                {
                    "function_score": {
                        "query": {"match": {"tag": "tag"}},
                        "weight": 0.1,
                    }
                },
            ]
        }
    },
    "_source": False,
}

In other words, I want to know how this which is for ElasticSearch can be done in OpenSearch.

Edit 1: I want to do this Elasticsearch new feature in OpenSearch. The question is how and also what does the query mentioned above does exactly.

Alireza Fa
  • 131
  • 6

1 Answers1

3

Searching multiple kNN fields in Elasticsearch is not yet supported. Here you can find the development, not yet released, related to issue #91187 and PR #92118 that was merged for version 8.7... the current version is 8.6.

In OpenSearch's documentation regarding k-NN, no reference can currently be found. However, looking at this Github issue, it seems that searching on multiple vector fields is possible in a single search request using either a boolean query or a dis_max query.

In that comment, the solution proposed is a neural query using the script_score but with the knn query it should also be fine. For example:

{
    "query": {
        "bool": {
            "should": [{
                    "knn": {
                        "vector_field_1": {
                            "vector": [
                                -0.009013666,
                                -0.07266349,
                                ......,
                                -0.1163235
                            ],
                            "k": 100
                        }
                    }
                },
                {
                    "knn": {
                        "vector_field_2": {
                            "vector": [
                                -0.003729963,
                                0.14770366,
                                ......,
                                0.032361716
                            ],
                            "k": 100
                        }
                    }
                },
                {
                    "match": {
                        "general_text": "apple"
                    }
                }
            ]
        }
    }
}

Keep in mind that vector is the query vector (i.e. query text encoded into the corresponding vectors) that must have the same number of dimensions as the vector field you are searching against (512 in your example).

If your intent is to recalculate the relevance score of documents that are returned using a function that you define, function_score or script_score could be used.

The use of the function_score seems to be still supported in Opensearch, although the documentation is not so exhaustive. Its use, therefore, depends on what you are looking for but certainly defining a weight of 1 means that you are not affecting the score. You can set the explain tag to true to see the explained output and understand how the score is combined:

GET /_search?explain=true

Finally, if you are interested in vector search with OpenSearch, we recently wrote a blog post in which we provide a detailed description of the new neural search plugin introduced with version 2.4.0 through an end-to-end testing experience.

Seasers
  • 466
  • 2
  • 7
  • Thanks for the response. I edited the question to show a valid query based on your first two points. The 8.7 feature you mentioned, is exactly what I want to do but in OpenSearch. I guess the feature is not there yet or not documented yet. Then what the query that I mentioned in the question does though if not multiple kNN field search? You mentioned that the use of `function_score` is incorrect. What would be the correct use? I can't find any documentation on how to use it in OpenSearch. Can `function_score` be combined with approximate k-NN search? How can kNN with other features be done? – Alireza Fa Feb 15 '23 at 13:46
  • 1
    The answer has been updated, I hope it can be helpful. – Seasers Feb 21 '23 at 16:03
  • Thanks for the updated answer. Unfortunately, there is not much documentation about `script_score` or `function_score`. In my example query, I think `function_score` and `script_score` have the same behavior and I think both use approximate kNN, not exact kNN (like in this [link](https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/)). I tried using the explain option but it seems that it doesn't support the kNN plugin and it gives a value of 1 to the explanations for a kNN search. – Alireza Fa Feb 27 '23 at 14:23