0

I have a DataFrame with a column of json arrays

id | group | data
---+-------+------
 0 |   100 | [{'a':0,'b':0},{'a':0,'b':1},...]
 1 |   100 | [{'a':1,'b':0},{'a':1,'b':1},...]
 2 |   100 | [{'a':2,'b':0},{'a':2,'b':1},...]
 3 |   101 | [{'a':0,'b':0},{'a':0,'b':1},...]
 4 |   101 | [{'a':1,'b':0},{'a':1,'b':1},...]
 5 |   100 | [{'a':2,'b':0},{'a':2,'b':1},...]

and I am interested in combining the json data for each group

id | group | data
---+-------+------
 0 |   100 | [{'a':0,'b':0},{'a':0,'b':1},...,{'a':1,'b':0},{'a':1,'b':1},...]
 1 |   101 | [{'a':0,'b':0},{'a':0,'b':1},...,{'a':1,'b':0},{'a':1,'b':1},...]
 2 |   102 | [{'a':0,'b':0},{'a':0,'b':1},...,{'a':1,'b':0},{'a':1,'b':1},...]

Unfortuantely I am having trouble finding an efficient way to accomplish this.

I think that I should be able to do it using mydata.groupby(['group']) to produce the grouped data and then I am not sure where to go from there

Ellis Valentiner
  • 2,136
  • 3
  • 25
  • 36
  • possible duplicate of [pandas groupby and join lists](http://stackoverflow.com/questions/23794082/pandas-groupby-and-join-lists) – Carsten Feb 26 '15 at 21:06

1 Answers1

0
import itertools
mydata = mydata.groupby('group')['data'].agg(lambda s: list(itertools.chain(*[l for l in s])))

not very beautiful but it should work

maxbellec
  • 16,093
  • 10
  • 36
  • 43