1

I have index where data field is keyword type. I store string like this one:

[a-zA-Z0-9.]+\@[a-zA-Z0-9]+\.[a-zA-Z]+

but when I return it in Python from Elastic I get string like this one (because it is stored escaped in Elastic):

\\[a\\-zA\\-Z0\\-9\\.\\]\\+\\\\\\@\\[a\\-zA\\-Z0\\-9\\]\\+\\\\\\.\\[a\\-zA\\-Z\\]\\+

How I can return it back to original form in Python?

I tried to store it as binary type field and use encode/decode but basically I get same escaped string as well.

Luka Lopusina
  • 2,557
  • 3
  • 27
  • 32
  • Did you try decoding with encoding='unicode_escape'? https://stackoverflow.com/a/14820462/3841261 – Chad Kennedy Aug 28 '18 at 17:47
  • Maybe just replace all double backslashes with singles? – N Chauhan Aug 28 '18 at 17:49
  • @ChadKennedy I tried that but it doesn't work for me for some reason. If I try pattern.encode("unicode_escape") then I end up with one more escape of everything and decode with same param is not working on string only on byte array but even if I convert to byte array and then decode I end up with same string from beginning. Really strange issue :) – Luka Lopusina Aug 28 '18 at 19:45
  • @NChauhan That is no working because then I will end up with \[a\-zA\-... which is also not correct I shouldn't have on these positions any but I also can't remove all double ones because on some places where I need one it will be removed. – Luka Lopusina Aug 28 '18 at 19:47
  • s.replace(r'\\\\\\', 'myspecialtag').replace(r'\\', '').replace('myspecialtag', '\\') – Chad Kennedy Aug 28 '18 at 22:44

1 Answers1

0

I changed idea and I convert string to hex and then store that hex value in Elastic because that one is normal string which don't need to be escaped and then when I read it from Elastic I just revert the process like this:

import binascii

# Convert string to hex
def toHex(text):
    return binascii.hexlify(bytes(text, 'utf-8')).decode("utf-8")

# Convert hex to string
def toStr(text):
    return binascii.unhexlify(bytes(text, 'utf-8')).decode("utf-8").replace('\\\\', '\\')

This is not direct answer for my question but it works for me so maybe you will find it useful too.

Luka Lopusina
  • 2,557
  • 3
  • 27
  • 32