1

This question is similar to one asked here, but with a tuple index. Grouping a column of lists works fine for a single index:

mydata = [{'idx': 'A', 'list_str': ['hi', 'babe']},
          {'idx': 'A', 'list_str': ['take', 'a', 'walk']},
          {'idx': 'A', 'list_str': []},
          {'idx': 'B', 'list_str': ['on', 'the', 'wild', 'side']}]


df = pd.DataFrame(mydata)
grouped = df.groupby('idx') 
print(grouped.agg({'list_str': lambda x: tuple(x)}))

With the expected output:

idx     list_str
A       [hi, babe, take, a, walk]
B       [on, the, wild, side]

However, adding a second index no longer works:

mydata = [{'idx': 'A', 'idx2': 'B', 'list_str': ['hi', 'babe']},
          {'idx': 'A', 'idx2': 'B', 'list_str': ['take', 'a', 'walk']},
          {'idx': 'A', 'idx2': 'B', 'list_str': []},
          {'idx': 'B', 'idx2': 'C', 'list_str': ['on', 'the', 'wild', 'side']}]

df = pd.DataFrame(mydata)
grouped = df.groupby(('idx', 'idx2'))
print(grouped.agg({'list_str': sum}))

Gives a ValueError, Function does not reduce.

What is the proper way to do this?

Community
  • 1
  • 1
nbubis
  • 2,304
  • 5
  • 31
  • 46

1 Answers1

1

To group by multiple columns use a list:

grouped = df.groupby(['idx', 'idx2'])
print(grouped.agg({'list_str': sum}))

Possibly you thought you were doing:

df['new_index'] = df.apply(lambda row: (row['idx'],row['idx2']), axis=1)
df.set_index('new_index',inplace=True)

grouped = df.groupby(df.index)
print(grouped.agg({'list_str': sum}))
jack6e
  • 1,512
  • 10
  • 12
  • Have you tried the code? this actually gives the same exact error. – nbubis Jun 23 '17 at 14:50
  • I did try it, using exactly the input you provided. Did you use a list `[]` instead of a tuple `()` in the groupby call? Or is your input data mis-constructed in that it has two separate indices instead of a single index of tuples? – jack6e Jun 23 '17 at 15:43
  • using the same 'mydata' variable, and `df = pd.DataFrame(mydata); grouped = df.groupby(['idx', 'idx2']); print(grouped.agg({'list_str': sum}))` gives the same error using pandas version 0.19.2. – nbubis Jun 25 '17 at 11:14