0

I am having a hard time reading json files with a different structure than I am used to. The content of the json files are all inside brackets: [{content}].

This is what I normally do:

data_dir = 'data/filesDump'
filenames = os.listdir(data_dir)
filenames = [os.path.join(data_dir, f) for f in filenames if f.endswith('.json')]

train_df = pd.concat([pd.read_json(file, encoding='UTF-8') for file in filenames], 
           ignore_index = True)

I get this error:

ValueError: Expected object or value

The only thing different with the thousands json I got is that the content is in a bracket []. So I suspect this is giving json_read a problem? Anyone know how to load such format?

Sample (I may have made a mistake in brackets but that's just to give an idea):

[{"id":"value","title":"value","body":"text","categories":[{"id":value,"name":"name","keys":[{"id":value,"hits":["word1","word2"]},{"id":value,"hits":["word1","word2"]}],"date":value}]

Monduiz
  • 711
  • 10
  • 22
  • 1
    Normally [] denotes a list in JSON. Can you share a sample Json file? – crowgers Jan 08 '19 at 16:03
  • I suspect that the issue is coming from how you are concatenating the JSON files to be parsed by panda. This may be of interested https://stackoverflow.com/questions/27046593/parsing-comma-separated-json-from-a-file – crowgers Jan 08 '19 at 16:09

2 Answers2

0

Not all JSON files can be converted to a DataFrame, a specific format is required.

You should first convert your JSON files to Python structures with the standard json module, then you can modify the structure to fit the DataFrame constructor requirements.

For example, if your JSON has an extra bracket around the usual dictionary required to make a DataFrame, meaning the data is included in a list as sugested by @Atreus, you can remove it by taking only the first element of the list :

import json
struct=json.loads('[{"A":{"0":1,"1":2,"2":3},"B":{"0":4,"1":5,"2":6}}]')
print pd.DataFrame(struct[0])

outputs :

   A  B
0  1  4
1  2  5
2  3  6
manu190466
  • 1,557
  • 1
  • 9
  • 17
0

So, it turns out that I do need to use json.loads like manu is refering to but with a few things:

json.load(open(file, encoding='utf-8-sig'))
Monduiz
  • 711
  • 10
  • 22