Is there a way to find out via the elasticsearch API how a query string query is actually parsed? You can do that manually by looking at the lucene query syntax, but it would be really nice if you could look at some representation of the actual results the parser has.
-
1Have you tried enabling the explain output by adding explain=true to your search request? – javanna Aug 23 '13 at 11:06
-
1@javanna Since purpose of explain=true is just to explain the rating of each individual result, that helps a little to guess what's happening. But I'd prefer something explicit, especially for complex cases. – Dr. Hans-Peter Störr Aug 26 '13 at 09:48
-
1Right, have a look at the [validate query api](http://www.elasticsearch.org/guide/reference/api/validate/) then, and use explain there too, should be better. – javanna Aug 26 '13 at 17:36
-
@javanna Good idea, but unfortunately validate's explanation just repeats the query. – Dr. Hans-Peter Störr Aug 27 '13 at 07:19
1 Answers
As javanna mentioned in comments there's _validate api. Here's what works on my local elastic (version 1.6):
curl -XGET 'http://localhost:9201/pl/_validate/query?explain&pretty' -d'
{
"query": {
"query_string": {
"query": "a OR (b AND c) OR (d AND NOT(e or f))",
"default_field": "t"
}
}
}
'
pl
is name of index on my cluster. Different index could have different analyzers, that's why query validation is executed in a scope of an index.
The result of the above curl is following:
{
"valid" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"explanations" : [ {
"index" : "pl",
"valid" : true,
"explanation" : "filtered(t:a (+t:b +t:c) (+t:d -(t:e t:or t:f)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@ce2d82f1)"
} ]
}
I made one OR
lowercase on purpose and as you can see in explanation, it is interpreted as a token and not as a operator.
As for interpretation of the explanation. Format is similar to +-
operators of query string
query:
- ( and ) characters start and end
bool query
- + prefix means clause that will be in
must
- - prefix means clause that will be in
must_not
- no prefix means that it will be in
should
(withdefault_operator
equal toOR
)
So above will be equivalent to following:
{
"bool" : {
"should" : [
{
"term" : { "t" : "a" }
},
{
"bool": {
"must": [
{
"term" : { "t" : "b" }
},
{
"term" : { "t" : "c" }
}
]
}
},
{
"bool": {
"must": {
"term" : { "t" : "d" }
},
"must_not": {
"bool": {
"should": [
{
"term" : { "t" : "e" }
},
{
"term" : { "t" : "or" }
},
{
"term" : { "t" : "f" }
}
]
}
}
}
}
]
}
}
I used _validate
api quite heavily to debug complex filtered
queries with many conditions. It is especially useful if you want to check how analyzer tokenized input like an url or if some filter is cached.
There's also an awesome parameter rewrite
that I was not aware of until now, which causes the explanation to be even more detailed showing the actual Lucene query that will be executed.

- 2,709
- 1
- 25
- 29