0

I am working with a JSON file which needs to be read, edited and saved. Its content is the following:

{
    "images_src": [
        "img-1.jpg",
        "img-2.jpg",
        "img-3.jpg",
    ],
    "scheme": "https",
    "host": "www.list-em.com"
}

I'm using (more or less) the following code to do so:

file = open('file.json', "r+")
data = json.load(file)
data['images_list'] = []
file.truncate(0)
json.dump(data, file)
file.close()

Which results in the file being saved like this (I'm only showing a part, for illustrative purposes, since it's a very long output -i.e: 189 lines-)

0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 7b0d 0a20 2020 2022
7061 796c 6f61 6422 3a20 7b0d 0a20 2020
2020 2020 2022 6163 6365 7373 5f74 6f6b

Now, doing some research I found out this solution for editing JSON files, and it's working as expected. So my question is more for understanding the reason behind the behavior I was getting with my method.

I thought that when I did this: file.truncate(0), I was truncating the file's content from the initial position, so now I would have an empty file, ready to dump the new JSON data. I mean, I thought that by using truncate(0) I wouldn't need to use seek(0), but apparently it's not how it works.

revliscano
  • 2,227
  • 2
  • 12
  • 21
  • 1
    The hexadecimal is an artifact of how you're viewing the file. (The file *is* broken, but not in a way that's writing hexadecimal to the file.) – user2357112 May 28 '20 at 21:19
  • `truncate` isn't like hitting `Ctrl-A Backspace` in a text editor - file position doesn't automatically move to stay in the bounds of the existing contents. – user2357112 May 28 '20 at 21:21
  • Huh, I see... I thought that the zero that's being passed as an argument did exactly that. – revliscano May 28 '20 at 21:23
  • 2
    Also, if something goes wrong with the call to `json.dump`, your original file is gone. Read (and close) the original file, update the data, write it to a *new* file, and replace the original with the new once the write has succeeded. – chepner May 28 '20 at 21:24
  • @chepner Oh thanks for that tip! For some reason I thought that doing less IO operations on the file was better. Thanks again. – revliscano May 28 '20 at 21:27
  • 1
    @chepner does Python include anything to do that automatically? It's such a common pattern I'd be surprised if it didn't. – Mark Ransom May 28 '20 at 21:28
  • I don't think so. I know at work we've written (probably more than once, sadly) a context manager to handle the temporary file. – chepner May 28 '20 at 23:59
  • @revliscano The only thing you are saving is a single close and open, both of which are relatively cheap. The cost of writing the entire data structure back to the prematurely truncated file is the same. – chepner May 29 '20 at 00:01
  • `'r+'` is good for *binary* files, where you may only need to write a single fixed-length record to a particular location. With text files, though, you rarely make such writes: you are usually replacing an entire line at a time, and unless the newline is *exactly* the same length as the old one, you need to rewrite everything after the new line, either to close the gap left by or make room for the new line. – chepner May 29 '20 at 00:02
  • Thanks for the useful insights, @chepner – revliscano May 29 '20 at 00:21
  • So in conclusion, what's most suited for text files? Opening in read mode (`'r'`) close it, and then opening it again in writing mode (`'w'`), right? @chepner – revliscano May 29 '20 at 00:31
  • 1
    Open a *new* file in write mode ( the `tempfile` module is helpful for this). Assuming you are able to completely write the new data to the tempfile without errors, you can then rename that file with the original. This completes *atomically*: either it works and you have the new data under the old name, or it fails, and you still have the old data under the old name. There's no chance of losing the old data before the new data is in place. – chepner May 29 '20 at 11:17

0 Answers0