0

I'm given some input that I must parse and convert to a Dict. I don't control how the input is generated.

An example input is u'{u\'my_key\': u\'AB\\N\'}'. Notice the this should represent a serialized dictionary.

Parsing this dictionary string fails using a variety of methods. Using json.loads fails due to the structure of the string being malformed due to the nested u. Using ast.literal_eval fails with a (unicode error) 'unicodeescape' codec can't decode bytes in position 3-4: malformed \N character escape error.

I need to somehow sanitize the input so the \N won't be considered an ascii character when parsed with ast. Doing a simple replace('\\', '\\\\') seems error prone and probably has many edge cases.

Alternatively, I need a way to remove the u from the nested string so json.loads would work.

Thanks

Guy Grin
  • 1,968
  • 2
  • 17
  • 38
  • That's not JSON but a Python 2 dictionary representation. It is not intended to be a data exchange format. – Klaus D. Jan 20 '19 at 13:23
  • 2
    "Parsing this json fails using a variety of methods" - because it _isn't_ json, it's just the result of calling `unicode` on a dictionary. I know this isn't very helpful, but the solution here is to get whoever or whatever is sending you this garbage to fix their data. – snakecharmerb Jan 20 '19 at 13:24
  • Thank you both for trying to help. I'll edit the post to make it more clear – Guy Grin Jan 20 '19 at 13:32

1 Answers1

0

Handling this kind of input is not easy. In fact the only solution I have been able to find is this one:

input_data = u'{u\'my_key\': u\'AB\\N\'}'

i = input_data\
    .replace('\'', '"')\
    .replace('u', '')\
    .replace('\\', '\\\\')

data = json.loads(i)
print(type(data))
# <type 'dict'>

It may solves your specific example, however I don't encourage to use it in your project.

As @snakecharmerb said, I would also suggest to enforce some kind of policy on the inputs and validate the json string before sending it, using something like this for instance.

lch
  • 2,028
  • 2
  • 25
  • 46