1

My list/dictionary is nested with lists for different items in it like this:

scores = [{"Student":"Adam","Subjects":[{"Name":"Math","Score":85},{"Name":"Science","Score":90}]},
     {"Student":"Bec","Subjects":[{"Name":"Math","Score":70},{"Name":"English","Score":100}]}]

If I use pd.DataFrame directly on the dictionary I get:

enter image description here

What should I do in order to get a data frame that looks like this:

Student   Subject.Name   Subject.Score
 Adam         Math            85
 Adam         Science         90
 Bec          Math            70
 Bec          English         100

?

Thanks very much

Yingdong Zhai
  • 51
  • 1
  • 8

1 Answers1

3

Use json_normalize with rename:

df = (pd.json_normalize(scores, 'Subjects','Student')
        .rename(columns={'Name':'Subject.Name','Score':'Subject.Score'}))
print (df)
  Subject.Name  Subject.Score Student
0         Math             85    Adam
1      Science             90    Adam
2         Math             70     Bec
3      English            100     Bec

Or list with dict comprehension and DataFrame constructor:

df = (pd.DataFrame([{**x, **{f'Subject.{k}': v for k, v in y.items()}} 
                     for x in scores for y in x.pop('Subjects')]))
print (df)
  Student Subject.Name  Subject.Score
0    Adam         Math             85
1    Adam      Science             90
2     Bec         Math             70
3     Bec      English            100
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks json_normalize works for me well. Took me a while to figure out that I need to put the group-by item at last but apart from that it works like a charm. – Yingdong Zhai Jul 21 '22 at 05:27
  • @YingdongZhai - `. Took me a while to figure out that I need to put the group-by item` - do you need first column `Student` ? Then use [this](https://stackoverflow.com/a/13148611/2901002) solution. – jezrael Jul 21 '22 at 05:38
  • Hi @jezrael, what if I have another field like gender? Is it possible to normalise it as well? like the list changes to : scores = [{"Student":"Adam","Gender":,"M","Subjects":[{"Name":"Math","Score":85},{"Name":"Science","Score":90}]}, {"Student":"Bec","Gender":"F","Subjects":[{"Name":"Math","Score":70},{"Name":"English","Score":100}]}], and is it possible to show in gender column in the output df? Thanks – Yingdong Zhai Jul 21 '22 at 05:52
  • 1
    @YingdongZhai - Then use second solution – jezrael Jul 21 '22 at 05:53