Converted Nested Dict to Dataframe

Question

I got the data which has nested dict inside.

I am trying to use json_normalize to convert into dataframe, but I got an error.

d = ['{"appMetadata": {"index": "cfs_ccs_eddi_35725", "host": "iaasn00009634", "job": "splunk_scraper"}, "timestampEpochSecond": 1545172308711, "metricTags": {"source": "/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log"}, "metricName": "splunk.logs.tstats.count.per.min", "metricValue": 5, "metricType": "count"}']

df = pd.DataFrame.from_dict({(i,j): d[i][j] 
                             for i in d.keys() 
                             for j in d[i].keys()},
                            orient='index')

The error is:

AttributeError: 'list' object has no attribute 'keys'

d is a list not a dict, the dict is a string inside your list — d_kennetz, Dec 18 '18 at 22:38
Have you tried this @david https://stackoverflow.com/questions/53752256/bytes-list-of-dictionary-convert-to-dataframe? ;) — ayorgo, Dec 18 '18 at 22:44
Hi Ayorgo, thank you so much for the help, I tried, it didn't because the data type now changed to list, not sure how to apply your help before. the error I got is TypeError: list indices must be integers or slices, not str — david, Dec 18 '18 at 22:46
@david, no worries. It was a list of bytes the last time, now it seems to be the list of strings. You can apply the very same thing omitting the first step of `list_of_string = list(map(lambda d: d.decode('utf-8'), data))` — ayorgo, Dec 18 '18 at 22:50
to get it out of a list to a dict just do: `for x in d: loaded_json = json.loads(x)` — d_kennetz, Dec 18 '18 at 22:50
Hi @ ayogo, I got the error shows AttributeError: 'str' object has no attribute 'decode' — david, Dec 18 '18 at 22:54
Also when I apply list_of_dicts = list(map(literal_eval, d)) df = pd.DataFrame(list_of_dicts), the new error is TypeError: Expected list, got dict — david, Dec 18 '18 at 23:04
What I meant @david is that you need to copy the code from the previous answer but not the first line with `decode`. — ayorgo, Dec 18 '18 at 23:06
Hi @ ayorgo, you are right, when I first run the code it works, however, when I go back to rerun to check and it shows error TypeError: list indices must be integers or slices, not str — david, Dec 18 '18 at 23:21
Basically you change the first line from the other answer to `list_of_string = d` and carry on with the rest. — ayorgo, Dec 18 '18 at 23:27
Thank you for the help Ayoro, it has this error TypeError: list indices must be integers or slices, not str — david, Dec 18 '18 at 23:49
Hi Ayorgo, I found the error, it is because my data from URL some of them are dict, some of them are list. Your code works perfect for the list, but got error when the data type is dict. — david, Dec 19 '18 at 02:30
You say you are trying to use `json_normalize` but i don't see it anywhere in your question... — gosuto, Dec 19 '18 at 07:29

score 0 · Answer 1 · answered Dec 18 '18 at 22:39

0

convert your d list into dict

d = dict(itertools.zip_longest(*[iter(l)] * 2, fillvalue=""))

answered Dec 18 '18 at 22:39

Ayoub Benayache

1,046
12
28

score 0 · Answer 2 · answered Sep 19 '19 at 08:20

You have a `list` with a `str` inside, not a `dict` or `json`:

Note that the dict part of the data shown, is surrounded by quotes ['{bunch_of_stuff}']
That can be fixed with ast.literal_eval

Data:

d = ['{"appMetadata": {"index": "cfs_ccs_eddi_35725", "host": "iaasn00009634", "job": "splunk_scraper"}, "timestampEpochSecond": 1545172308711, "metricTags": {"source": "/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log"}, "metricName": "splunk.logs.tstats.count.per.min", "metricValue": 5, "metricType": "count"}']

Code:

from ast import literal_eval

d = literal_eval(d[0])

print(d)

Output of fixed `d`:

{'appMetadata': {'index': 'cfs_ccs_eddi_35725',
  'host': 'iaasn00009634',
  'job': 'splunk_scraper'},
 'timestampEpochSecond': 1545172308711,
 'metricTags': {'source': '/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log'},
 'metricName': 'splunk.logs.tstats.count.per.min',
 'metricValue': 5,
 'metricType': 'count'}

`pandas.io.json.json_normalize`:

from pandas.io.json import json_normalize
import pandas as pd

df = json_normalize(d)

print(df)

Output of `df`:

 timestampEpochSecond                        metricName  metricValue metricType   appMetadata.index appMetadata.host appMetadata.job                                                     metricTags.source
        1545172308711  splunk.logs.tstats.count.per.min            5      count  cfs_ccs_eddi_35725    iaasn00009634  splunk_scraper  /local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log

Converted Nested Dict to Dataframe

2 Answers2

You have a list with a str inside, not a dict or json:

Data:

Code:

Output of fixed d:

pandas.io.json.json_normalize:

Output of df:

You have a `list` with a `str` inside, not a `dict` or `json`:

Output of fixed `d`:

`pandas.io.json.json_normalize`:

Output of `df`: