0

So I have the following dictionaries that I get by parsing a text file

keys = ["scientific name", "common names", "colors]
values = ["somename1", ["name11", "name12"], ["color11", "color12"]]

keys = ["scientific name", "common names", "colors]
values = ["somename2", ["name21", "name22"], ["color21", "color22"]]

and so on. I am dumping the key value pairs using a dictionary to a json file using a for loop where I go through each key value pair one by one

for loop starts
    d = dict(zip(keys, values))
    with open("file.json", 'a') as j:
        json.dump(d, j)

If I open the saved json file I see the contents as

{"scientific name": "somename1", "common names": ["name11", "name12"], "colors": ["color11", "color12"]}{"scientific name": "somename2", "common names": ["name21", "name22"], "colors": ["color21", "color22"]}

Is this the right way to do it?

The purpose is to query the common name or colors for a given scientific name. So then I do

with open("file.json", "r") as j:
    data = json.load(j)

I get the error, json.decoder.JSONDecodeError: Extra data: I think this is because I am not dumping the dictionaries in json in the for loop correctly. I have to insert some square brackets programatically. Just doing json.dump(d, j) won't suffice.

ontherocks
  • 1,747
  • 5
  • 26
  • 43
  • 2
    Don't `a`ppend to the file. Make it one JSON document with a root array, then you can read in the current list, add to it and write the whole thing back out. – jonrsharpe Jun 10 '19 at 14:10
  • @jonrsharpe Are you saying that instead of appending, I create some list out of the dictionaries in the for loop and then write that list at the end to a json file...bit of a json novice here? – ontherocks Jun 10 '19 at 14:18
  • @ontherocks That's what I'd do at least in this situation. – Stam Kaly Jun 10 '19 at 14:20
  • Related: https://stackoverflow.com/questions/11639886/how-to-read-a-json-file-containing-multiple-root-elements – Finomnis Jun 10 '19 at 14:23
  • @StamKaly Should the list be like `dictList = [dict1, dict2, dict3....]` and then `json.dump(dictList, j)`? – ontherocks Jun 10 '19 at 14:24
  • 1
    yeah, json can handle that just fine. – Stam Kaly Jun 10 '19 at 14:28

1 Answers1

2

JSON may only have one root element. This root element can be [], {} or most other datatypes.

In your file, however, you get multiple root elements:

{...}{...}

This isn't valid JSON, and the error Extra data refers to the second {}, where valid JSON would end instead.

You can write multiple dicts to a JSON string, but you need to wrap them in an array:

[{...},{...}]


But now off to how I would fix your code. First, I rewrote what you posted, because your code was rather pseudo-code and didn't run directly.

import json

inputs = [(["scientific name", "common names", "colors"],
           ["somename1", ["name11", "name12"], ["color11", "color12"]]),
          (["scientific name", "common names", "colors"],
           ["somename2", ["name21", "name22"], ["color21", "color22"]])]

for keys, values in inputs:
    d = dict(zip(keys, values))
    with open("file.json", 'a') as j:
        json.dump(d, j)

with open("file.json", 'r') as j:
    print(json.load(j))

As you correctly realized, this code failes with

json.decoder.JSONDecodeError: Extra data: line 1 column 105 (char 104)

The way I would write it, is:

import json

inputs = [(["scientific name", "common names", "colors"],
           ["somename1", ["name11", "name12"], ["color11", "color12"]]),
          (["scientific name", "common names", "colors"],
           ["somename2", ["name21", "name22"], ["color21", "color22"]])]

jsonData = list()
for keys, values in inputs:
    d = dict(zip(keys, values))
    jsonData.append(d)

with open("file.json", 'w') as j:
    json.dump(jsonData, j)

with open("file.json", 'r') as j:
    print(json.load(j))

Also, for python's json library, it is important that you write the entire json file in one go, meaning with 'w' instead of 'a'.

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • I populate the common names and colors list on the fly in the loop, and then after I create the dict object I clear the list so that those lists are re-populated with new values in the next for loop iteration. But then this also clears the value in the dict object (it behaves like pointers). I would have to create a shallow or deep copy instead of `jsonData.append(d)`, but I am not getting the syntax right. Would you have an idea? – ontherocks Jun 10 '19 at 14:59
  • I think this is a new and unrelated question, and should usually not be discussed in the comment section of a previous problem. Nonetheless, I think this should answer it: https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list – Finomnis Jun 10 '19 at 15:05
  • Also, I don't think clearing the list is the right thing to do, just create a new list, then the old one lives on – Finomnis Jun 10 '19 at 15:07
  • And I'd appreciate to become the accepted answer if you found it helpful :) – Finomnis Jun 10 '19 at 15:08
  • If I leave the old lists, the number of those lists would then be in 100 of thousands. I have such big data. I may run out of memory. – ontherocks Jun 10 '19 at 15:17
  • It really doesn't make any difference, the json array still holds all of them anyways, if you copy them or not. So i think your point is invalid. Also, 100 of thousands is not a lot. If it can't fit in memory, you can't fit it in a list, anyway. – Finomnis Jun 10 '19 at 15:18
  • If you are at the point where it doesn't fit in memory, I think JSON is the wrong file type. Consider an SQL database. – Finomnis Jun 10 '19 at 15:20
  • Or CSV, those are iterable and can be updated line by line, without having the entire file in memory. JSON can't. – Finomnis Jun 10 '19 at 15:24
  • Another solution I just found is the [.jsonl](http://jsonlines.org/) file format. There is even a [python library](https://jsonlines.readthedocs.io/en/latest/) for it. This might be exactly what you were looking for. – Finomnis Jun 12 '19 at 13:25
  • I will take a look at this. Thanks – ontherocks Jun 12 '19 at 14:02