15

Suppose I need to have a database file consisting of a list of dictionaries:

file:

[
  {"name":"Joe","data":[1,2,3,4,5]},
  {   ...                         },
           ...
]

I need to have a function that receives a list of dictionaries as shown above and appends it to the file. Is there any way to achieve that, say using json (or any other method), without loading the file?

EDIT1: Note: What I need, is to append new dictionaries to an already existing file on the disc.

jazzblue
  • 2,411
  • 4
  • 38
  • 63
  • What do you mean by "without loading it"? – user2357112 Aug 06 '13 at 18:15
  • 1
    Well, one way is to load the file into the memory, append the new list to it and dump the result back to the disc. Is it possible to just write the new list to the disc, appending it to the end of the file without loading the file to the memory? – jazzblue Aug 06 '13 at 18:43
  • This could be of use: http://stackoverflow.com/questions/12460943/merging-pre-sorted-files-without-reading-everything-into-memory Load the new dict to a new file, and then merge the two files perhaps? – Jeremy Kalas Aug 06 '13 at 18:52

4 Answers4

31

You can use json to dump the dicts, one per line. Now each line is a single json dict that you've written. You loose the outer list, but you can add records with a simple append to the existing file.

import json
import os

def append_record(record):
    with open('my_file', 'a') as f:
        json.dump(record, f)
        f.write(os.linesep)

# demonstrate a program writing multiple records
for i in range(10):
    my_dict = {'number':i}
    append_record(my_dict)

The list can be assembled later

with open('my_file') as f:
    my_list = [json.loads(line) for line in f]

The file looks like

{"number": 0}
{"number": 1}
{"number": 2}
{"number": 3}
{"number": 4}
{"number": 5}
{"number": 6}
{"number": 7}
{"number": 8}
{"number": 9}
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Here it looks like you are not actually appending dictionaries to the existing file on the disc, but rather creating all the dictionaries in the code and writing them into a file. What I need is to append them to an existing file. I should probably note that in my original question. – jazzblue Aug 06 '13 at 19:17
  • No, its appending to the file as you want. The for loop is just a demo of a program that appends records to the file several times. Run the demo twice and you get more records on the end. I'll edit for clarity. – tdelaney Aug 06 '13 at 19:24
  • 1
    Good solution if you don't want to use pretty json (which makes assembling part harder if you want to) – saeedgnu Aug 06 '13 at 19:28
  • @ilius - yeah, record files aren't supposed to be pretty! Its really a question of what you want to use as a record separator. If you don't pretty print, then json won't add any new lines and that's a good separator (that's what I did here). If you want pretty printing, you could pick something like '\n---\n' but you'd have to scan for it and do the record blocking yourself. – tdelaney Aug 06 '13 at 19:35
9

If it is required to keep the file being valid json, it can be done as follows:

import json

with open (filepath, mode="r+") as file:
    file.seek(0,2)
    position = file.tell() -1
    file.seek(position)
    file.write( ",{}]".format(json.dumps(dictionary)) )

This opens the file for both reading and writing. Then, it goes to the end of the file (zero bytes from the end) to find out the file end's position (relatively to the beginning of the file) and goes last one byte back, which in a json file is expected to represent character ]. In the end, it appends a new dictionary to the structure, overriding the last character of the file and keeping it to be valid json. It does not read the file into the memory. Tested with both ANSI and utf-8 encoded files in Python 3.4.3 with small and huge (5 GB) dummy files.

A variation, if you also have os module imported:

import os, json

with open (filepath, mode="r+") as file:
    file.seek(os.stat(filepath).st_size -1)
    file.write( ",{}]".format(json.dumps(dictionary)) )

It defines the byte length of the file to go to the position of one byte less (as in the previous example).

2

If you are looking to not actually load the file, going about this with json is not really the right approach. You could use a memory mapped file… and never actually load the file to memory -- a memmap array can open the file and build an array "on-disk" without loading anything into memory.

Create a memory-mapped array of dicts:

>>> import numpy as np
>>> a = np.memmap('mydict.dat', dtype=object, mode='w+', shape=(4,))
>>> a[0] = {'name':"Joe", 'data':[1,2,3,4]}
>>> a[1] = {'name':"Guido", 'data':[1,3,3,5]}
>>> a[2] = {'name':"Fernando", 'data':[4,2,6,9]}
>>> a[3] = {'name':"Jill", 'data':[9,1,9,0]}
>>> a.flush()
>>> del a

Now read the array, without loading the file:

>>> a = np.memmap('mydict.dat', dtype=object, mode='r')

The contents of the file are loaded into memory when the list is created, but that's not required -- you can work with the array on-disk without loading it.

>>> a.tolist()
[{'data': [1, 2, 3, 4], 'name': 'Joe'}, {'data': [1, 3, 3, 5], 'name': 'Guido'}, {'data': [4, 2, 6, 9], 'name': 'Fernando'}, {'data': [9, 1, 9, 0], 'name': 'Jill'}]

It takes a negligible amount of time (e.g. nanoseconds) to create a memory-mapped array that can index a file regardless of size (e.g. 100 GB) of the file.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

Using the same approach as user3500511...

Suppose we have two lists of dictionaries (dicts, dicts2). The dicts are converted to json formatted strings. Dicts is saved to a new file - test.json. Test.json is reopened and the string objects are formatted with the proper delimiters. With the reformatted objects, dict2 can be appended and the file still maintains the proper structure for a JSON object.

import json

dicts = [{ "name": "Stephen", "Number": 1 }
         ,{ "name": "Glinda", "Number": 2 }
         ,{ "name": "Elphaba", "Number": 3 }
         ,{ "name": "Nessa", "Number": 4 }]

dicts2= [{ "name": "Dorothy", "Number": 5 }
         ,{ "name": "Fiyero", "Number": 6 }]


f = open("test.json","w")
f.write(json.dumps(dicts))
f.close()

f2 = open("test.json","r+")
f2.seek(-1,2)
f2.write(json.dumps(dicts2).replace('[',',',1))
f2.close()

f3 = open('test.json','r')
f3.read()