5

I have json file with some data, and would like to occasionally update this file.

I read the file:

with open('index.json', 'rb') as f:
    idx = json.load(f)

then check for presence of a key from potentially new data, and if key is not present update the file:

with open('index.json', mode='a+') as f:
    json.dump(new_data, f, indent=4)

However this procedure just creates new json object (python dict) and appends it as new object in output json file, making the file not valid json file.

Is there any simple way to append new data to json file without overwriting whole file, by updating the initial dict?

theta
  • 24,593
  • 37
  • 119
  • 159
  • Related: http://stackoverflow.com/questions/13949637/how-to-update-json-file-with-python – theta Mar 14 '13 at 17:06
  • 2
    Open the file in `'w'` mode, not append+write mode.. – Martijn Pieters Mar 14 '13 at 17:09
  • 1
    Is there a practical reason to not just rewrite the whole file? This sounds like it could get ugly. Plus, the underlying file doesn't support "insert" operations, so if your update is near the beginning you will at least have to rewrite the rest of the file. – FatalError Mar 14 '13 at 17:09
  • @Martijn Pieters: that will overwrite file (initial data), I use append as I want to append data. – theta Mar 14 '13 at 17:11
  • @theta: That's not how it works; you updated the JSON structure by appending perhaps, but a file is not the same thing. You need to rewrite it. – Martijn Pieters Mar 14 '13 at 17:12
  • @FatalError: Well, file is huge, and potential update content is negligible in comparison – theta Mar 14 '13 at 17:12
  • That happens because you are opening the file for append. Open it once for read, close it, then open it for write with your new data. – hughdbrown Mar 14 '13 at 17:12
  • Also, if JSON isn't a strict requirement, consider `pickle` – Sudipta Chatterjee Mar 14 '13 at 17:12
  • OK, thanks all. I'll overwrite the data, just wasn't sure if there is some way of updating. – theta Mar 14 '13 at 17:13
  • @theta: Well, to be fair, you've already loaded the huge JSON object into memory, so you've already eaten that cost to some extent. – FatalError Mar 14 '13 at 17:14
  • Yeah, maybe I should consider XML later. – theta Mar 14 '13 at 17:15
  • @theta: If your JSON object is really huge, you could always consider splitting it up and using a document-oriented DB like CouchDB or MongoDB to manage them. – FatalError Mar 14 '13 at 17:24
  • Thanks @FatalError, but that seems like too complicated. Currently file is ~40MB and I visualize this data and JSON offers compatibility. XML offers too, but JSON is again easier to me. – theta Mar 14 '13 at 17:28

1 Answers1

11

One way to do what you're after is to write one JSON object per line in the file. I'm using that approach and it works quite well.

A nice benefit is that you can read the file more efficiently (memory-wise) because you can read one line at a time. If you need all of them, there's no problem with assembling a list in Python, but if you don't you're operating much faster and you can also append.

So to initially write all your objects, you'd do something like this:

with open(json_file_path, "w") as json_file:
    for data in data_iterable:
        json_file.write("{}\n".format(json.dumps(data)))

Then to read efficiently (will consume little memory, no matter the file size):

with open(json_file_path, "r") as json_file:
    for line in json_file:
        data = json.loads(line)
        process_data(data)

To update/append:

with open(json_file_path, "a") as json_file:
    json_file.write("{}\n".format(json.dumps(new_data)))

Hope this helps :)

kgr
  • 9,750
  • 2
  • 38
  • 43
  • Thanks @kgr, it seems like nice trick. I assume indent would be impossible this way. – theta Mar 14 '13 at 17:52
  • 1
    @theta - you're welcome. Yes, if you store one object per line it's not possible to have JSON in file pretty-printed (indented). You could have some marker after every JSON object though and use that to distinguish where one objects ends and another starts (here `\n` is such a marker which also plays a special role). It would be more tricky to get right, but certainly possible. That way you could have indents in file, because newlines would be ignored. The whole idea is not to have one JSON object in file, but multiple so that you can add more when you feel like it in an efficient way... – kgr Mar 14 '13 at 17:57