0
client = Elasticsearch([host1, host2], http_auth=(user, password), scheme="http", port=port)
response = client.search(index="complats*", body={"from": 0, "size": 10000, "query": {
            "bool": {
                "must": [
                    {
                        "query_string": {
                            "query": "tags:\"prod\" AND severity:\"INFO\" AND service:\"abc-service\" AND msg:\"* is processed\"",
                            "fields": [],
                            "type": "best_fields",
                            "default_operator": "or",
                            "max_determinized_states": 10000,
                            "enable_position_increments": "true",
                            "fuzziness": "AUTO",
                            "fuzzy_prefix_length": 0,
                            "fuzzy_max_expansions": 50,
                            "phrase_slop": 0,
                            "escape": "false",
                            "auto_generate_synonyms_phrase_query": "true",
                            "fuzzy_transpositions": "true",
                            "boost": 1.0
                        }
                    },
                    {
                        "range": {
                            "@timestamp": {
                                "from": "now-{}s".format((now.minute + 1) * 60),
                                "to": "now",
                                "include_lower": "true",
                                "include_upper": "true",
                                "boost": 1.0
                            }
                        }
                    }
                ],
                "adjust_pure_negative": "true",
                "boost": 1.0
            }
        }})
value = response['hits']['total']['value']
print(value)

The above query is successfully connecting to elasticsearch but returning an incorrect value or 10000 every time. What could be wrong here? I've read somewhere that elasticsearch module in python has a bug where it maxes out at 10000. Anyone else faced this problem? If yes how did u resolve it? Thanks in advance!

1 Answers1

2

it isn't a python library bug, the impossibility to return more than 10000 results is a setting inherited from lucene. If you need more results you should use search_after query for pagination, or scroll query for a single heavy search, it depends from your use case. Have a look to my response here to view an example of implementing these queries with python

Lupanoide
  • 3,132
  • 20
  • 36