Problems with elasticsearch query and regular expressions - Python

Question

I found that the Elasticsearch query doesn't accept a lot of characters due its regular expression processment, and this is messing me up.

On documentation:

Reserved charactersedit If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2, you would need to write your query as (1+1)\=2.

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

I already tried to avoid this error but by pre-processing the test on Python but still didn't solve. I tried the following:

def strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')
::

        eval_file_content = eval_file_content.replace('"', '\\"')
        eval_file_content = eval_file_content.replace("'", "\\'")
        eval_file_content = eval_file_content.replace("-", "\-")
        eval_file_content = eval_file_content.replace("+", "\+")
        eval_file_content = eval_file_content.replace("&&", "\&&")
        eval_file_content = eval_file_content.replace("||", "\||")
        eval_file_content = eval_file_content.replace("<", "\<")
        eval_file_content = eval_file_content.replace(">", "\>")
        eval_file_content = eval_file_content.replace("!", "\!")
        eval_file_content = eval_file_content.replace("(", "\(")
        eval_file_content = eval_file_content.replace(")", "\)")
        eval_file_content = eval_file_content.replace("{", "\{")
        eval_file_content = eval_file_content.replace("}", "\}")
        eval_file_content = eval_file_content.replace("[", "\[")
        eval_file_content = eval_file_content.replace("]", "\]")
        eval_file_content = eval_file_content.replace("^", "\^")
        eval_file_content = eval_file_content.replace("~", "\~")
        eval_file_content = eval_file_content.replace("*", "\*")
        eval_file_content = eval_file_content.replace("?", "\?")
        eval_file_content = eval_file_content.replace(":", "\:")
        eval_file_content = eval_file_content.replace("\\", "\\\\")
        eval_file_content = eval_file_content.replace("/", "\/")
        eval_file_content = strip_accents(eval_file_content)
        eval_file_content = eval_file_content.encode("ascii", errors="ignore").decode()

How can I solve this? Here's an example of "before and after" pre-processing:

https://pastebin.com/C94Gjhiw

For the query, I'm using the following method:

searchQuery = "http://localhost:9200/metis/ER/_search?q='" + eval_file_content + "'"
        res = requests.get(searchQuery).content

The request error:

b'{"error":{"root_cause":[{"type":"query_shard_exception","reason":"Failed to parse query [\'the pmNrOfIpTermsRej of VMGW increasing  \\n1.the VMGW1_LFGS8 of LFGM3 pmNrOfIpT'

One problem (maybe not your problem) is that you are processing the backslash *last* (almost). That causes any backslashes you've already added for other characters to be doubled up, becoming literal backslashes instead of performing the escape function. The characters you intended to escape are then unescaped. You should escape backslash first. And if you're not going to use a regular expression, at least use a loop! — kindall, Nov 27 '17 at 16:47
In Python, all you need is `re.escape(s)`, no need chaining so many `replaces`s. — Wiktor Stribiżew, Nov 27 '17 at 16:52
Interesting but I didn't catch all the idea... Could you please example it with Python code? If this works I will accept the answer... — denisb411, Nov 27 '17 at 16:53
@WiktorStribiżew thanks a lot Wiktor, this is a very good trick. Now I'm using query with body (json) and it's better. Post this as answer please. — denisb411, Nov 27 '17 at 17:20

Problems with elasticsearch query and regular expressions - Python

0 Answers0