1

I have an json dataframe with tedx talks as items (rows), that has a column 'ratings' in json format going like this. (The column depicts how the talk was described by audience)

[{"id": 7, "name": "Funny", "count": 19645}, {"id": 1, "name": "Beautiful", "count": 4573}, {"id": 9, "name": "Ingenious", "count": 6073}, ..........]

[{"id": 7, "name": "Funny", "count": 544}, {"id": 3, "name": "Courageous", "count": 139}, {"id": 2, "name": "Confusing", "count": 62}, {"id": 1, "name": "Beautiful", "count": 58}, ........]

Obviously the order of the descriptive words name is not standard/same for each item (tedx talk). Each word has an id(same for all talks) and a count respectively for each talk. I am interested in manipulating the data and extracting three new integer columns regarding counts of: funny, inspiring, confusing, storing there the count for each of those words for the respective talks

Among other stuff, tried this

   df['ratings'] = df['ratings'].map(lambda x: dict(eval(x)))

in return i get this error

File "C:/Users/Paul/Google Drive/WEEK4/ted-talks/w4e1.py", line 30, in df['ratings'] = df['ratings'].map(lambda x: dict(eval(x)))

ValueError: dictionary update sequence element #0 has length 3; 2 is required

Been trying several different ways, but havent been able to even get values from the json formatted column properly. Any suggestions?

Heavy Load
  • 13
  • 5

1 Answers1

0

You can use list comprehension with flattening and convert string repr to list of dict by ast.literal_eval what is better solution like eval:

import pandas as pd
import ast

df = pd.DataFrame({'ratings': ['[{"id": 7, "name": "Funny", "count": 19645}, {"id": 1, "name": "Beautiful", "count": 4573}, {"id": 9, "name": "Ingenious", "count": 6073}]', '[{"id": 7, "name": "Funny", "count": 544}, {"id": 3, "name": "Courageous", "count": 139}, {"id": 2, "name": "Confusing", "count": 62}, {"id": 1, "name": "Beautiful", "count": 58}]']})
print (df)
                                             ratings
0  [{"id": 7, "name": "Funny", "count": 19645}, {...
1  [{"id": 7, "name": "Funny", "count": 544}, {"i...

df1 = pd.DataFrame([y for x in df['ratings'] for y in ast.literal_eval(x)])
print (df1)
   id        name  count
0   7       Funny  19645
1   1   Beautiful   4573
2   9   Ingenious   6073
3   7       Funny    544
4   3  Courageous    139
5   2   Confusing     62
6   1   Beautiful     58
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252