how to convert a nested json to a flat data frame

Question

Im trying to convert this json file into a flat data frames including all the columns.

However, I keeps receiving this error:

Traceback (most recent call last):
  File "./readjsonfile.py", line 19, in <module>
    print(data[fields])
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1316, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['utterance', 'turns.frames.actions.act'] not in index"

I reviewed this link to learn how to do that. And this is my code:

f = open('./dialogues_001.json')
jsondata = json.load(f)

fields = ['dialogue_id', 'services', 'turns.frames.actions.act', 'turns.utterance']
data = pd.json_normalize(jsondata)

print(data[fields])

@Barmar Could you please share your idea on what could be the reason? — sariii, Jun 02 '22 at 15:12
Never mind, a list does work. I think your list of column names is just wrong. Let me check the JSON. — Barmar, Jun 02 '22 at 15:19
Have you tried `print(data.columns)` to see what all the column names are? — Barmar, Jun 02 '22 at 15:20
`utterance` should be `turns.utterance`. I'm not sure what's wrong with `turns.frames.actions.act`. — Barmar, Jun 02 '22 at 15:21
Your original use of a list was correct. I deleted all the comments about changing to a tuple. — Barmar, Jun 02 '22 at 15:50
@Barmar when I print `print(data.columns)` I get this `['dialogue_id', 'services', 'turns']` — sariii, Jun 02 '22 at 16:19
But this is supposed to be like this, right? since this is the first tier column names — sariii, Jun 02 '22 at 16:21
I don't understand why you're getting only 3 columns when you use `pd.json_normalize()`. There are lots of other columns in the JSON. — Barmar, Jun 02 '22 at 16:21
Yea I saw in examples, all the columns were listed after using `json_normalize`.the format of json though is a little bit different from what we have here https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e . Im not sure this could be the reason — sariii, Jun 02 '22 at 16:23
I think the format could be the reason since when I use the other way (not using `json_normalize`) I am not able to flat the columns yet! — sariii, Jun 02 '22 at 16:28
I think the problem is with nested arrays. See https://stackoverflow.com/questions/57438540/use-json-normalize-to-normalize-json-with-nested-arrays — Barmar, Jun 02 '22 at 16:42

how to convert a nested json to a flat data frame

0 Answers0