0

Im trying to convert this json file into a flat data frames including all the columns.

However, I keeps receiving this error:

Traceback (most recent call last):
  File "./readjsonfile.py", line 19, in <module>
    print(data[fields])
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "../anaconda/envs/myenv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1316, in _validate_read_indexer
    raise KeyError(f"{not_found} not in index")
KeyError: "['utterance', 'turns.frames.actions.act'] not in index"

I reviewed this link to learn how to do that. And this is my code:

f = open('./dialogues_001.json')
jsondata = json.load(f)

fields = ['dialogue_id', 'services', 'turns.frames.actions.act', 'turns.utterance']
data = pd.json_normalize(jsondata)

print(data[fields])
sariii
  • 2,020
  • 6
  • 29
  • 57
  • 1
    Please post the full stack trace of the error. – Scott Hunter Jun 02 '22 at 14:34
  • @Barmar Could you please share your idea on what could be the reason? – sariii Jun 02 '22 at 15:12
  • 1
    Never mind, a list does work. I think your list of column names is just wrong. Let me check the JSON. – Barmar Jun 02 '22 at 15:19
  • Have you tried `print(data.columns)` to see what all the column names are? – Barmar Jun 02 '22 at 15:20
  • `utterance` should be `turns.utterance`. I'm not sure what's wrong with `turns.frames.actions.act`. – Barmar Jun 02 '22 at 15:21
  • 1
    Your original use of a list was correct. I deleted all the comments about changing to a tuple. – Barmar Jun 02 '22 at 15:50
  • @Barmar when I print `print(data.columns)` I get this `['dialogue_id', 'services', 'turns']` – sariii Jun 02 '22 at 16:19
  • But this is supposed to be like this, right? since this is the first tier column names – sariii Jun 02 '22 at 16:21
  • I don't understand why you're getting only 3 columns when you use `pd.json_normalize()`. There are lots of other columns in the JSON. – Barmar Jun 02 '22 at 16:21
  • Yea I saw in examples, all the columns were listed after using `json_normalize`.the format of json though is a little bit different from what we have here https://medium.com/swlh/converting-nested-json-structures-to-pandas-dataframes-e8106c59976e . Im not sure this could be the reason – sariii Jun 02 '22 at 16:23
  • I think the format could be the reason since when I use the other way (not using `json_normalize`) I am not able to flat the columns yet! – sariii Jun 02 '22 at 16:28
  • I think the problem is with nested arrays. See https://stackoverflow.com/questions/57438540/use-json-normalize-to-normalize-json-with-nested-arrays – Barmar Jun 02 '22 at 16:42

0 Answers0