Get RequestError(400, 'search_phase_execution_exception', 'runtime error') for cossimilarity

Question

I am trying to do semantic search with Elasticsearch using tensorflow_hub, but I get RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error') . From search_phase_execution_exception I suppose that with corrupted data(from this stack question) My document structure looks like this

{
"settings": {
  "number_of_shards": 2,
  "number_of_replicas": 1
},
 "mappings": {
  "dynamic": "true",
  "_source": {
    "enabled": "true"
  },
  "properties": {
        "id": {
            "type":"keyword"
        },
        "title": {
            "type": "text"
        },
        "abstract": {
            "type": "text"
        },
        "abs_emb": {
            "type":"dense_vector",
            "dims":512
        },
        "timestamp": {
            "type":"date"
        }
    }
}
}

And I create a document using elasticsearch.indices.create.

es.indices.create(index=index, body='my_document_structure')
res = es.indices.delete(index=index, ignore=[404])
for i in range(100):
  doc = {
    'timestamp': datetime.datetime.utcnow(),
    'id':id[i],
    'title':title[0][i],
    'abstract':abstract[0][i],
    'abs_emb':tf_hub_KerasLayer([abstract[0][i]])[0]
  }
  res = es.index(index=index, body=doc)

for my semantic search I use this code

query = "graphene" query_vector = list(embed([query])[0])

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, doc['abs_emb']) + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

response = es.search(
    index=index,
    body={
        "size": 5,
        "query": script_query,
        "_source": {"includes": ["title", "abstract"]}
    }
)

I know there are some similar questions in stackoverflow and elsasticsearch, but I couldn't find solution for me. My guess is that the document structure is wrong but I can't figure out what exactly. I used search query code from this repo. The full error message is too long and doesn't seem to contain much information, so I share only last part of it.

~/untitled/elastic/venv/lib/python3.9/site-packages/elasticsearch/connection/base.py in 
_raise_error(self, status_code, raw_data)
320             logger.warning("Undecodable raw error response from server: %s", err)
321 
--> 322         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
323             status_code, error_message, additional_info
324         )

RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

An here is the Error from Elasticsearch server.

[2021-04-29T12:43:07,797][WARN ][o.e.c.r.a.DiskThresholdMonitor] 
[asmac.local] high disk watermark [90%] exceeded on 
[w7lUacguTZWH9xc_lyd0kg][asmac.local][/Users/username/elasticsearch- 
7.12.0/data/nodes/0] free: 17.2gb[7.4%], shards will be relocated 
away from this node; currently relocating away shards totalling [0] 
bytes; the node is expected to continue to exceed the high disk 
watermark when these relocations are complete

Can you also add the error log you're seeing in the ES server log? — Val, Apr 29 '21 at 08:41
You don't see anything related to `search_phase_execution_exception` in the logs? — Val, Apr 29 '21 at 08:51

score 3 · Answer 1 · answered Apr 29 '21 at 08:53

3

I think you're hitting the following issue and you should update your query to this:

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, 'abs_emb') + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

Also make sure that query_vector contains floats and not doubles

answered Apr 29 '21 at 08:53

Val

207,596
13
358
360

I have checked the type and it was numpy.float32 so didn't seem like that was the case. I also updated the code from doc['abs_emb'] to 'abs_emb' , but I still get same error. – Armen Sanoyan Apr 29 '21 at 09:08
Ok, I'm pretty sure you should find the error in the ES logs somewhere... Do you have multiple nodes? If not, can you maybe [increase the log level](https://elasticsearch-py.readthedocs.io/en/v7.12.0/index.html?highlight=logging#logging) of your Python client to dump the error in your client code logs – Val Apr 29 '21 at 09:10
I will check it. Hope this google colab can help to find the problem https://colab.research.google.com/drive/1eRvDeO73I_Xiap2X2HZOqgkgUGMwzs4m?usp=sharing – Armen Sanoyan Apr 29 '21 at 09:12
1

no I have just one node and I increased the log level to info still no results. – Armen Sanoyan Apr 29 '21 at 10:05

score 1 · Answer 2 · answered Aug 12 '21 at 09:43

in my case the error was "Caused by: java.lang.ClassCastException: class org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class org.elasticsearch.xpack.vect ors.query.VectorScriptDocValues$DenseVectorScriptDocValues"

My mistake was - I removed the ES index before starting ingesting content. The one that had the "type":"dense_vector" field.

It caused ES did not use the correct type for indexing dense vectors: they were stored as useless lists of doubles. In this sense the ES index was 'corrupted': all 'script_score' queries returned 400.

BEWARB · Answer 3 · 2023-06-15T09:40:28.233

0

For me the issue was I was using dense_vector instead of elastiknn_dense_float_vector which is still open issue. I am converting my vector index to use dense_vector instead: https://github.com/alexklibisz/elastiknn/issues/323

edited Jun 15 '23 at 09:40

answered Jun 12 '23 at 18:56

BEWARB

131
1
10

Get RequestError(400, 'search_phase_execution_exception', 'runtime error') for cossimilarity

3 Answers3