1

I am not sure that I have got this right; I am trying to use json library in python.

I dump a nested dictionary in a json file on disk, and then I would like to load it back as it was before. Although when I load back the file, I don't get the same object that I had before.

mydictionary=defaultdict(dict)
... 
with open("myfile.json", "w") as outfile:
    dump(mydictionary, outfile) #saving the dictionary to json file 
....
with open("myfile.json") as outfile:
    restored_dict=load(outfile)
for keys in restored_dict:
    print keys

The dictionary structure:

{
    "product1": {
        "item1" : [
            "red",
            "soft",
            "430"
        ],
        "item2" : [
            "green",
            "soft",
            "112"
        ],
        "item3" : [
            "blue",
            "hard",
            "12"
        ]
    },
    "product2": {
        "item4" : [
            "black",
            "soft",
            "30"
        ],
        "item5" : [
            "indigo",
            "hard",
            "40"
        ],
        "item6" : [
            "green",
            "soft",
            "112"
        ]
    }
}   

When I print the object before and after, they are not the same; I cannot access the keys and values anymore, once I restore the dictionary. I get a long sequence of data, with a "u" at the beginning of each item and key; the only way to print it correctly is if I dump it again and print the output

print dumps(restored_dict, indent=4)

But I still cannot access the keys, values and items.

I see that there are 2 functions: one has the s at the end (dump-dumps, load-loads), but I can't tell the difference. Some tutorials online say that the one with the s is creating a string instead than a json object, while others say that one save in binary and another in plain text...

I am trying to save the dictionary, and load it at later time; I thought that json was the simplest way to achieve this, but for some reason I can't achieve this.

  • Please provide source-code as short as possible, but which is executable so we are able to reproduce the potential bug. – barrios Mar 13 '15 at 08:18
  • Fair enough; adding more details to the code example –  Mar 13 '15 at 08:21
  • 1
    There's nothing wrong with your dict, these `u`'s are just a python representation of values and can be ignored in your code (in most cases). – georg Mar 13 '15 at 08:24
  • @newbiez: we are more looking for examples of the data stored in the dictionaries. What strings are stored? What's in those strings? Do you understand what I mean by ASCII and UTF-8? If not, you'll need to read up on those concepts. I recommend [this article](http://www.joelonsoftware.com/articles/Unicode.html). – Martijn Pieters Mar 13 '15 at 08:27
  • My bad; I didn't understand that you were asking for the structure of the dictionary itself; changing the original question to add it. And I am not familiar with how UTF8 and ASCII works; I know briefly what they are, but I am not aware of the differences –  Mar 13 '15 at 08:30
  • @newbiez: that's all just ASCII data. You can use the JSON loaded info *just fine*. You may want to include an example of a loaded structure and what confuses you about it or problems you have with it in your question. – Martijn Pieters Mar 13 '15 at 08:58
  • I would suggest using the [pickle module](https://docs.python.org/2/library/pickle.html#pickle-python-object-serialization) as shown in [this answer](http://stackoverflow.com/questions/4529815/how-to-save-an-object-in-python/4529901#4529901). – martineau Mar 13 '15 at 09:00

1 Answers1

2

JSON stores data in Unicode. The u prefixes indicate you have Unicode strings in Python too when you loaded it.

If your keys contained only ASCII characters, you can load those keys just fine using byte strings (leaving off the u prefix):

>>> import json
>>> d = {'foo': 'bar'}
>>> new_d = json.loads(json.dumps(d))
>>> new_d
{u'foo': u'bar'}
>>> new_d['foo']
u'bar'

If your keys were UTF-8 encoded, you'll have to decode those to Unicode strings, or use Unicode string literals (prefixed by the u character again):

>>> utf8_key = u'å'.encode('utf8')  # manually encoded for demo purposes
>>> utf8_key
'\xc3\xa5'
>>> utf8_d = {utf8_key: 'bar'}
>>> utf8_d
{'\xc3\xa5': 'bar'}
>>> new_utf8_d = json.loads(json.dumps(utf8_d))
>>> new_utf8_d
{u'\xe5': u'bar'}
>>> new_utf8_d[u'å']
u'bar'

The string values are still Unicode strings; you could encode those back to UTF-8 if you needed bytes, but generally speaking it is better to handle text as Unicode as much as possible.

Printing Unicode strings will auto-encode them to the correct codec for the current stdout target.

You may want to read up on Python and Unicode:

Alternatively, use the pickle library to give you a round-trip Python data format. The output won't be human-readable like JSON is however.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I see, so when I output to a JSON stream, it converts it in unicode? What would be the correct sequence to get the exact object that I had before the encoding? Otherwise, how can I get the data from what I load back in Python? Tried some examples found online but I can't get the same object back. –  Mar 13 '15 at 08:18
  • @newbiez: You don't necessarily *have* to. You can encode everything back to UTF-8 bytes, but that's not really a good idea. – Martijn Pieters Mar 13 '15 at 08:20
  • I think the last part is wrong. There's no need to encode/decode anything in JSON. Just dump/load unicode strings normally, json handles everything for you. – georg Mar 13 '15 at 08:23
  • @georg: we *don't know what the OP has in their dictionary*. I am trying to cover all bases here. – Martijn Pieters Mar 13 '15 at 08:24
  • @georg: but if the OP started with UTF-8 bytestrings, then `json.dump()` would have *decoded* those to Unicode objects before producing the JSON output. They'll get `\uhhhh` sequences in the data. That can be confusing. – Martijn Pieters Mar 13 '15 at 08:25
  • Right, we don't know what the OP has there. Still, I object to `{u'å'.encode('utf8'): 'bar'}` - this is a confusing advice! – georg Mar 13 '15 at 08:29
  • @georg: that's not part of the advice. That's me showing the key is UTF-8 encoded at the start, resulting in a Unicode when passed through the `json` module. – Martijn Pieters Mar 13 '15 at 08:30