0

Which is the best way to store dictionary of strings in file(as they are big) and load it partially in python. Dictionary of strings here means, keyword would be a string and the value would be a list of strings.

Dictionary storing in appended form to check keys, if available not update or else update. Then use keys for post processing.

4 Answers4

1

Usually a dictionary is stored in JSON.

I'll leave here a link:

Convert Python dictionary to JSON array

0

You could simply write the dictionary to a text file, and then create a new dictionary that only pulls certain keys and values from that text file.

But you're probably best off exploring the json module.

Here's a straighforward way to write a dict called "sample" to a file with the json module:

import json
with open('result.json', 'w') as fp:
    json.dump(sample, fp)

On the loading side, we'd need to know more about how you want to choose which keys to load from the JSON file.

  • The problem with that is that it loads everything into memory, which appears to be a problem due to the size of the dictionary. – Florian Weimer Oct 04 '18 at 12:16
0

The above answers are great, but i hate using JSON, i have had issues with pickle before that corrupted my data, so what i do is, i use numpy's save and load

To save np.save(filename,dict)

to load dict = np.load(filename).item()

really simple and works well, as far as loading partially goes, you could always split the dictionary into multiple smaller dictionaries and save them as individual files, maybe not a very concrete solution but it could work

to split the dictionary you could do something like this

temp_dict = {}
for i,k in enumerate(dict.keys()):
    if i%1000 == 0:
        np.save("records-"+str(i-1000)+"-"+str(i)+".npy",temp_dict)
        temp_dict = {}

    temp_dict[k]=dict[k].value()

then for loading just do something like

my_dict={}
all_files = glob.glob("*.npy")
for f in all_files:
     dict = np.load(filename).item()
     my_dict.update(dict)
Imtinan Azhar
  • 1,725
  • 10
  • 26
0

If this is for some sort of database type use then save yourself the headache and use TinyDB. It uses JSON format when saving to disc and will provide you the "partial" loading that you're looking for.

I only recommend TinyDB as this seems to be the closest to what you're looking to achieve, maybe try googling for other databases if this isn't your fancy there's TONS of them out there!

Jab
  • 26,853
  • 21
  • 75
  • 114