3

I want to convert my dataframe to a json string. If I simply do df.to_json(orient='records'), then it's converted well. However, I want to make several transformations in the json string.

This is my dataframe df:

df = 
GROUP   HOUR    AVG_MINUTES   AVG_GRADE
AAA     7       67            5.5
AAA     8       58            6.5
AAA     9       55            4.5
BBB     7       15            5.1
BBB     8       18            5.4
CCC     9       34            5.5

The json string should look as follows:

[
{
"GROUP":"AAA",
"AVG_MINUTES":[[7,67],[8,58],[9,55]],
"AVG_GRADE":[[7,5.5],[8,6.5],[9,4.5]]
},
{
"GROUP":"BBB",
"AVG_MINUTES":[[7,15],[8,18],[9,34]],
"AVG_GRADE":[[7,5.1],[8,5.4],[9,5.5]]
}
]

I want to get HOUR values inside each pair in AVG_MINUTES and AVG_GRADE. Is it possible to do? Or should I do it manually? (it would be a bad news, since the dataframe is quite big)

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Dinosaurius
  • 8,306
  • 19
  • 64
  • 113

1 Answers1

6

You can first create lists in columns AVG_MINUTES and AVG_GRADE. Then groupby and aggregate tolist() and last use DataFrame.to_dict with parameter orient='records':

df.AVG_MINUTES = df[['HOUR','AVG_MINUTES']].values.tolist()
df.AVG_GRADE = df[['HOUR','AVG_GRADE']].values.tolist()

print (df)
  GROUP  HOUR AVG_MINUTES   AVG_GRADE
0   AAA     7     [7, 67]  [7.0, 5.5]
1   AAA     8     [8, 58]  [8.0, 6.5]
2   AAA     9     [9, 55]  [9.0, 4.5]
3   BBB     7     [7, 15]  [7.0, 5.1]
4   BBB     8     [8, 18]  [8.0, 5.4]
5   CCC     9     [9, 34]  [9.0, 5.5]

df = df.groupby('GROUP')['AVG_MINUTES','AVG_GRADE']
       .agg(lambda x : x.tolist())
       .reset_index()
       .to_dict(orient='records')
print (df)
[
{'GROUP': 'AAA', 
 'AVG_GRADE': [[7.0, 5.5], [8.0, 6.5], [9.0, 4.5]], 
 'AVG_MINUTES': [[7, 67], [8, 58], [9, 55]]}, 
{'GROUP': 'BBB', 
 'AVG_GRADE': [[7.0, 5.1], [8.0, 5.4]],
 'AVG_MINUTES': [[7, 15], [8, 18]]}, 
{'GROUP': 'CCC',
 'AVG_GRADE': [[9.0, 5.5]],
 'AVG_MINUTES': [[9, 34]]}
]

If use DataFrame.to_json output is similar - HOUR in output is int created by zip where output is list of tuples what is converted to list of lists by map:

df.AVG_MINUTES = list(map(list, zip(df.HOUR, df.AVG_MINUTES)))
df.AVG_GRADE = list(map(list, zip(df.HOUR, df.AVG_GRADE)))

print (df)
  GROUP  HOUR AVG_MINUTES AVG_GRADE
0   AAA     7     [7, 67]  [7, 5.5]
1   AAA     8     [8, 58]  [8, 6.5]
2   AAA     9     [9, 55]  [9, 4.5]
3   BBB     7     [7, 15]  [7, 5.1]
4   BBB     8     [8, 18]  [8, 5.4]
5   CCC     9     [9, 34]  [9, 5.5]

df = df.groupby('GROUP')['AVG_MINUTES','AVG_GRADE']
       .agg(lambda x : x.tolist())
       .reset_index()
       .to_json(orient='records')
print (df)
[{"GROUP":"AAA",
"AVG_MINUTES":[[7,67],[8,58],[9,55]],
"AVG_GRADE":[[7,5.5],[8,6.5],[9,4.5]]},
{"GROUP":"BBB",
"AVG_MINUTES":[[7,15],[8,18]],
"AVG_GRADE":[[7,5.1],[8,5.4]]},
{"GROUP":"CCC",
"AVG_MINUTES":[[9,34]],
"AVG_GRADE":[[9,5.5]]}]
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hmm, I tested your first solution, and I get `[{'AVG_MINUTES': [67,58,55]` instead of `'AVG_MINUTES':[[7,67],[8,58],[9,55]]`. The same happens to `AVG_GRADE`. – Dinosaurius Jan 30 '17 at 13:00
  • Do you use `df.AVG_MINUTES = df[['HOUR','AVG_MINUTES']].values.tolist() df.AVG_GRADE = df[['HOUR','AVG_GRADE']].values.tolist()` first ? – jezrael Jan 30 '17 at 13:01
  • I add to answer output of `df` after converting to `lists` – jezrael Jan 30 '17 at 13:03
  • Super, it was not easy. I am glad can help you. – jezrael Jan 30 '17 at 13:06
  • Hmmm, I think problem of first solution is `values` convert to `numpy array` and if some `floats` in second columns all values are converted to `float`. So if need `hours` as `int`, is necessary use another second solution. – jezrael Jan 30 '17 at 13:09
  • Thank you very much for your help. – Dinosaurius Jan 30 '17 at 13:18