Unicode to dictionary (unicode contains apostrophe punctuation)

Question

I have read the following Unicode from a CSV file:

line = u"{u'There's Still Time': u'foo'}"

I would like to be able to convert this to a dictionary so I would be able to so I can access it as the following:

line["There's Still Time"] 
Output: 'foo'

Please help.

Are you using Python 2 or 3? Python 3 supports Unicode by default, you should use it if possible. — Rob Rose, Aug 01 '18 at 21:52
@RobRose that isn't the issue at all. The issue is that the OP dumped the string representation of a dict object to a csv file and now has to deserialize that. The *real* solution is to use an appropriate serialization format from the beginning. If that isn't possible,they can use one of the approaches in the linked duplicate target. — juanpa.arrivillaga, Aug 01 '18 at 21:53
You should really choose a better serialization format. Dont just dump the string representations of objects to a file and call it serialization. — juanpa.arrivillaga, Aug 01 '18 at 21:54
@juanpa.arrivillaga please untag as duplicate as I couldn't find solutions to the problem that addresses the apostrophe within the line of code. `line.replace(" ' ", ' " ')` is not a valid solution. As for serialization from the beginning what would you recommend? — Tonio Vassilaros, Aug 01 '18 at 22:05
The apostrophe makes it invalid syntax, you're not going to find a solution. The only way to make sense of that text is to make up a rule that apostrophes in the wrong place must be ignored. That's a very custom requirement. — Mark Ransom, Aug 01 '18 at 22:10
How was this CSV file created? It can't be parsed without heuristics that will be (a) nontrivial to write and test and (b) probably wrong in some cases, producing garbage. If the CSV file came from code that you wrote, or code that a coworker wrote, or code that a company you're paying or partnered with wrote, then fix that code and generate proper CSV files. It will be much easier, and better, than coming up with, and implementing, the heuristics for trying to repair the broken data. — abarnert, Aug 01 '18 at 22:33

score 2 · Answer 1 · answered Aug 01 '18 at 22:26

Given that there is an apostrophe within the string, you'll have to do some pre-processing before you even attempt to parse it into a dict. Assuming that all strings within the target dict are unicode and that closing strings have to be followed immediately by a control character (i.e. }, :, ,, }, whitespace...) you can search for all apostrophes that do not match these two categories and escape them. Then you can use ast.literal_eval() to parse it into a dict, something like:

import ast
import re

APOSTROPHE_ESCAPE = re.compile(r"(?<!u)'(?![.}:,\s])")

line = u"{u'There's Still Time': u'foo'}"
your_dict = ast.literal_eval(APOSTROPHE_ESCAPE.sub(r"\'", line))

print(your_dict)  # {u"There's Still Time": u'foo'}

Keep in mind, tho, that just a simple:

line = u"{u'There'}s Still Time': u'foo'}"

Will throw it off - sure, it would be an illegal dictionary in the source as well, but keep in mind these limitations and adjust your pre-process regex accordingly.

Unicode to dictionary (unicode contains apostrophe punctuation)

1 Answers1