3

The mapping of my Elastic search looks like below:

{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "node": {
      "properties": {
        "field1": {
          "type": "keyword"
        },
        "field2": {
          "type": "keyword"
        },
        "query": {
          "properties": {
            "regexp": {
              "properties": {
                "field1": {
                  "type": "keyword"
                },
                "field2": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

Problem is :

I am forming ES queries using elasticsearch_dsl Q(). It works perfectly fine in most of the cases when my query contains any complex regexp. But it totally fails if it contains regexp character '!' in it. It doesn't give any result when the search term contains '!' in it.

For eg:

1.) Q('regexp', field1 = "^[a-z]{3}.b.*") (works perfectly)

2.) Q('regexp', field1 = "^f04.*") (works perfectly)

3.)Q('regexp', field1 = "f00.*") (works perfectly)

4.) Q('regexp', field1 = "f04baz?") (works perfectly)

Fails in below case:

5.) Q('regexp', field1 = "f04((?!z).)*") (Fails with no results at all)

I tried adding "analyzer":"keyword" along with "type":"keyword" as above in the fields, but in that case nothing works.

In the browser i tried to check how analyzer:keyword will work on the input on the case it fails:

http://localhost:9210/search/_analyze?analyzer=keyword&text=f04((?!z).)*

Seems to look fine here with result:

{
  "tokens": [
    {
      "token": "f04((?!z).)*",
      "start_offset": 0,
      "end_offset": 12,
      "type": "word",
      "position": 0
    }
  ]
}

I'm running my queries like below:

search_obj = Search(using = _conn, index = _index, doc_type = _type).query(Q('regexp', field1 = "f04baz?"))
count = search_obj.count()
response = search_obj[0:count].execute()
logger.debug("total nodes(hits):" + " " + str(response.hits.total))

PLease help, its really a annoying problem as all the regex characters work fine in all the queries except !.

Also, how do i check what analyzer is currently applied with above setting in my mappings?

zubug55
  • 729
  • 7
  • 27
  • Does ElasticSearch support lookarounds in its regex syntax? – Tim Biegeleisen Aug 12 '18 at 06:38
  • It's been posted from few days now, i am not getting any help on this. Thanks – zubug55 Aug 12 '18 at 08:10
  • @TimBiegeleisen ; i just tried to do something like this ^(f04ba)[^z]+?$ instead of this f04((?!z).)* to avoid answers with z; and it did work. Does this thing gives you any hint that why ! in the regexp query doesnt give any results? – zubug55 Aug 12 '18 at 08:31
  • @wiktor-stribiżew can you help with this question plz? – zubug55 Aug 12 '18 at 09:08
  • @WiktorStribiżew . can you help with this question plz? – zubug55 Aug 12 '18 at 09:22
  • @WiktorStribiżew ; seems like you solved this -> https://stackoverflow.com/questions/38645755/negative-lookahead-regex-on-elasticsearch; how do i resolve the same thing in my case? – zubug55 Aug 12 '18 at 09:22
  • Ehm, try changing `"f04((?!z).)*"` to `"f04[^z]*"` as it seems the regex is just matching a string starting with `f04` and then 0 or more chars other than `z`. – Wiktor Stribiżew Aug 12 '18 at 11:41
  • Yes, but i want to know that "lookarounds" are not supported in elastic search right? ; so i tried writing this "f04((?!z).)*" to ".*f04.*&~(.*z.*)" by following question -> https://stackoverflow.com/questions/38645755/negative-lookahead-regex-on-elasticsearch ; is this right? – zubug55 Aug 12 '18 at 12:35
  • @WiktorStribiżew also as you said that it seems like it matches only string strating with f04 , but it doesn't actually ; i dont get any results. May be the reason is lookarounds not supported in ES? – zubug55 Aug 12 '18 at 12:40
  • In ElasticSearch regex (Lucene engine) lookarounds are not supported. BTW, `.*f04.*&~(.*z.*)` matches a strong having `f04` but not having `z` anywhere inside the string. What do you actually need? – Wiktor Stribiżew Aug 12 '18 at 12:50
  • That's correct, i want exactly this->.*f04.*&~(.*z.*) – zubug55 Aug 12 '18 at 12:53
  • Your answer https://stackoverflow.com/questions/38645755/negative-lookahead-regex-on-elasticsearch helped me to come up with this solution. Thanks sir – zubug55 Aug 12 '18 at 12:54

1 Answers1

3

ElasticSearch Lucene regex engine does not support any type of lookarounds. The ES regex documentation is rather ambiguous saying matching everything like .* is very slow as well as using lookaround regular expressions (which is not only ambiguous, but also wrong since lookarounds, when used wisely, may greatly speed up regex matching).

Since you want to match any string that contains f04 and does not contain z, you may actually use

[^z]*fo4[^z]*

Details

  • [^z]* - any 0+ chars other than z
  • fo4 - fo4 substring
  • [^z]* - any 0+ chars other than z.

In case you have a multicharacter string to "exclude" (say, z4 rather than z), you may use your approach using a complement operator:

.*f04.*&~(.*z4.*)

This means almost the same but does not support line breaks:

  • .* - any chars other than newline, as many as possible
  • f04 - f04
  • .* - any chars other than newline, as many as possible
  • & - AND
  • ~(.*z4.*) - any string other than the one having z4
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • can you also put in the answer that ES doesnt support lookarounds in regex. It's clearly not mentioned in ED dsl docs anywhere. – zubug55 Aug 13 '18 at 18:37