3

I have a unicode object which should represent a json but it contains the unicode u in it as part of the string value e.g. u'{u\'name\':u\'my_name\'}'

My goal is to be able to load this into a json object. Just using json.loads fails. I know this happens because of the u inside the string which are not part of an acceptable json format.

I, then, tired sanitizing the string using replace("u\'", "'"), encode('ascii', 'ignore') and other methods without success.

What finally worked was using ast.literal_eval but I'm worried about using it. I found a few sources online claiming its safe. But, I also found other sources claiming it's bad practice and one should avoid it.

Are there other methods I'm missing?

Guy Grin
  • 1,968
  • 2
  • 17
  • 38
  • 1
    `ast.literal_eval` *is* safe. – dawg Jan 06 '19 at 16:53
  • @dawg thanks for the quick response. Is it preferable to try and replace the `u` values etc.? Does `ast.literal_eval` have any negative downsides to it? – Guy Grin Jan 06 '19 at 16:59
  • 2
    *Does ast.literal_eval have any negative downsides to it?* No, not really. If it works for your data (other than the example here) - use it. – dawg Jan 06 '19 at 17:04
  • Thanks for the assist @dawg – Guy Grin Jan 06 '19 at 17:06
  • @GuyGrin, I have updated my answer with 1 more method. Please check, it as it allows you to use **json** module. Note that the **JSON** (JavaScript Object Notation) suggests `"` (double quotes) to surround keys and strings. And this is a reason for `json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)`. I have also suggested **Unsafe** way, just ignore that if you wish. You will need to find a way in case if your data string contains `u` as part of original data. – hygull Jan 06 '19 at 18:40

1 Answers1

2

The unicode string is the result of unicode being called on a dictionary.

>>> d = {u'name': u'myname'}
>>> u = unicode(d) 
>>> u  
u"{u'name': u'myname'}" 

If you control the code that's doing this, the best fix is to change it to call json.dumps instead.

>>> json.dumps(d)
'{"name": "myname"}'

If you don't control the creation of this object, you'll need to use ast.literal_eval to create the dictionary, as the unicode string is not valid json.

>>> json.loads(u)
Traceback (most recent call last):
...
ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)


>>> ast.literal_eval(u)
{u'name': u'myname'}

The docs confirm that ast.literal_eval is safe:

can be used for safely evaluating strings containing Python values from untrusted sources

You could use eval instead, but as you don't control the creation of the object you cannot be certain that it has not been crafted by a malicious user, to cause damage to your system.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153