24

I am newbie to ELK. I want to search for docs based on order of occurrence of words in a field. For example,

In doc1, my_field: "MY FOO WORD BAR EXAMPLE"
In doc2, my_field: "MY BAR WORD FOO EXAMPLE"

I would like to query in Kibana for docs where "FOO" is followed by "BAR" and not the opposite. So, I would like doc1 to return in this case and not doc2. I tried using below query in Kibana search. But, it is not working. This query doesn't even produce any search results.

my_field.raw:/.*FOO.*BAR.*/

I also tried with analyzed field(just my_field), though I came to know that should not work. And of course, that didn't produce any results either.

Please help me with this regex search. Why am I not getting any matching result for that query?

Krishna Chaitanya
  • 275
  • 1
  • 2
  • 9

3 Answers3

4
GET /_search
{
    "query": {
        "regexp": {
            "user": {
                "value": "k.*y",
                "flags" : "ALL",
                "max_determinized_states": 10000,
                "rewrite": "constant_score"
            }
        }
    }
}

More details on here

Jebaseelan Ravi
  • 697
  • 8
  • 22
3

I'm not sure offhand why that regex query wouldn't be working but I believe Kibana is using Elasticsearch's query string query documented here so for instance you could do a phrase query (documented in the link) by putting your search in double quotes and it would look for the word "foo" followed by "bar". This would perform better too since you would do this on your analyzed field (my_field) where it has tokenized each word to perform fast lookups. So you search in Kibana would be:

my_field: "FOO BAR"

Update:

Looks like this is an annoying quirk of Kibana (probably for backwards compatability reasons). Anyway, this isn't matching for you because you're searching against a non-analyzed field and apparently Kibana by default is lowercasing the search therefore it won't match the the non-analyzed uppercase "FOO". You can configure this in Kibana advanced settings mentioned here, specifically by setting the configuration option "lowercase_expanded_terms" to false.

RyanR
  • 80
  • 3
  • 10
  • Thanks for reply. Not just only that. I will be needing all the docs even if "FOO" and "BAR" are separated by some other words.
    Example: **Match** doc1, my_field: "MY FOO WORD BAR EXAMPLE" .
    **Not Match** doc2, my_field: "MY BAR WORD FOO EXAMPLE"
    – Krishna Chaitanya Nov 13 '16 at 03:36
  • So, I will require regex and not phrase matching – Krishna Chaitanya Nov 13 '16 at 03:42
  • Okay, I figured out why this was happening for you (weird quirk of Kibana), updated the answer. – RyanR Nov 13 '16 at 06:46
  • Also, from a performance standpoint using a span near query (which phrase matching) uses with a high slop value + in_order = true would achieve what you regex does and you could do it against the analyzed field which I *think* should perform better (because each token has its order so in theory it looks for both tokens then makes sure the indexOf(bar) > indexOf(foo), similar answer here - http://stackoverflow.com/a/26637081/1135228 – RyanR Nov 13 '16 at 06:59
3

Kibana’s standard query language is based on Lucene query syntax.

And the default analyzer will tokenize the text to different words: [MY, FOO, WORD, BAR, EXAMPLE]

Instead of using regex match, you can try the following search string in Kibana:

my_field: FOO AND my_field: BAR

And if your "my_field" data looks like "MYFOOWORDBAREXAMPLE",which can not be tokenized, you should use the query string:

my_field: *FOO*BAR*
carton.swing
  • 1,517
  • 17
  • 12