0

I got the data which has nested dict inside.

I am trying to use json_normalize to convert into dataframe, but I got an error.

d = ['{"appMetadata": {"index": "cfs_ccs_eddi_35725", "host": "iaasn00009634", "job": "splunk_scraper"}, "timestampEpochSecond": 1545172308711, "metricTags": {"source": "/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log"}, "metricName": "splunk.logs.tstats.count.per.min", "metricValue": 5, "metricType": "count"}']

df = pd.DataFrame.from_dict({(i,j): d[i][j] 
                             for i in d.keys() 
                             for j in d[i].keys()},
                            orient='index')

The error is:

AttributeError: 'list' object has no attribute 'keys'

jpp
  • 159,742
  • 34
  • 281
  • 339
david
  • 45
  • 8
  • 1
    d is a list not a dict, the dict is a string inside your list – d_kennetz Dec 18 '18 at 22:38
  • The string can be accessed using `d[0]` – Sheldore Dec 18 '18 at 22:39
  • Have you tried this @david https://stackoverflow.com/questions/53752256/bytes-list-of-dictionary-convert-to-dataframe? ;) – ayorgo Dec 18 '18 at 22:44
  • hi W-B, the data I use request.get from url, it is a list – david Dec 18 '18 at 22:44
  • Hi Ayorgo, thank you so much for the help, I tried, it didn't because the data type now changed to list, not sure how to apply your help before. the error I got is TypeError: list indices must be integers or slices, not str – david Dec 18 '18 at 22:46
  • @david, no worries. It was a list of bytes the last time, now it seems to be the list of strings. You can apply the very same thing omitting the first step of `list_of_string = list(map(lambda d: d.decode('utf-8'), data))` – ayorgo Dec 18 '18 at 22:50
  • to get it out of a list to a dict just do: `for x in d: loaded_json = json.loads(x)` – d_kennetz Dec 18 '18 at 22:50
  • Hi @ ayogo, I got the error shows AttributeError: 'str' object has no attribute 'decode' – david Dec 18 '18 at 22:54
  • Also when I apply list_of_dicts = list(map(literal_eval, d)) df = pd.DataFrame(list_of_dicts), the new error is TypeError: Expected list, got dict – david Dec 18 '18 at 23:04
  • What I meant @david is that you need to copy the code from the previous answer but not the first line with `decode`. – ayorgo Dec 18 '18 at 23:06
  • Hi @ ayorgo, you are right, when I first run the code it works, however, when I go back to rerun to check and it shows error TypeError: list indices must be integers or slices, not str – david Dec 18 '18 at 23:21
  • Basically you change the first line from the other answer to `list_of_string = d` and carry on with the rest. – ayorgo Dec 18 '18 at 23:27
  • Thank you for the help Ayoro, it has this error TypeError: list indices must be integers or slices, not str – david Dec 18 '18 at 23:49
  • Hi Ayorgo, I found the error, it is because my data from URL some of them are dict, some of them are list. Your code works perfect for the list, but got error when the data type is dict. – david Dec 19 '18 at 02:30
  • You say you are trying to use `json_normalize` but i don't see it anywhere in your question... – gosuto Dec 19 '18 at 07:29

2 Answers2

0

convert your d list into dict

d = dict(itertools.zip_longest(*[iter(l)] * 2, fillvalue=""))
Ayoub Benayache
  • 1,046
  • 12
  • 28
0

You have a list with a str inside, not a dict or json:

  • Note that the dict part of the data shown, is surrounded by quotes ['{bunch_of_stuff}']
  • That can be fixed with ast.literal_eval

Data:

d = ['{"appMetadata": {"index": "cfs_ccs_eddi_35725", "host": "iaasn00009634", "job": "splunk_scraper"}, "timestampEpochSecond": 1545172308711, "metricTags": {"source": "/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log"}, "metricName": "splunk.logs.tstats.count.per.min", "metricValue": 5, "metricType": "count"}']

Code:

from ast import literal_eval

d = literal_eval(d[0])

print(d)

Output of fixed d:

{'appMetadata': {'index': 'cfs_ccs_eddi_35725',
  'host': 'iaasn00009634',
  'job': 'splunk_scraper'},
 'timestampEpochSecond': 1545172308711,
 'metricTags': {'source': '/local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log'},
 'metricName': 'splunk.logs.tstats.count.per.min',
 'metricValue': 5,
 'metricType': 'count'}

pandas.io.json.json_normalize:

from pandas.io.json import json_normalize
import pandas as pd

df = json_normalize(d)

print(df)

Output of df:

 timestampEpochSecond                        metricName  metricValue metricType   appMetadata.index appMetadata.host appMetadata.job                                                     metricTags.source
        1545172308711  splunk.logs.tstats.count.per.min            5      count  cfs_ccs_eddi_35725    iaasn00009634  splunk_scraper  /local/apps/eddi/presentment/logs/eddi-http-presentment-requests.log
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158