This question is similar to one asked here, but with a tuple index. Grouping a column of lists works fine for a single index:
mydata = [{'idx': 'A', 'list_str': ['hi', 'babe']},
{'idx': 'A', 'list_str': ['take', 'a', 'walk']},
{'idx': 'A', 'list_str': []},
{'idx': 'B', 'list_str': ['on', 'the', 'wild', 'side']}]
df = pd.DataFrame(mydata)
grouped = df.groupby('idx')
print(grouped.agg({'list_str': lambda x: tuple(x)}))
With the expected output:
idx list_str
A [hi, babe, take, a, walk]
B [on, the, wild, side]
However, adding a second index no longer works:
mydata = [{'idx': 'A', 'idx2': 'B', 'list_str': ['hi', 'babe']},
{'idx': 'A', 'idx2': 'B', 'list_str': ['take', 'a', 'walk']},
{'idx': 'A', 'idx2': 'B', 'list_str': []},
{'idx': 'B', 'idx2': 'C', 'list_str': ['on', 'the', 'wild', 'side']}]
df = pd.DataFrame(mydata)
grouped = df.groupby(('idx', 'idx2'))
print(grouped.agg({'list_str': sum}))
Gives a ValueError
, Function does not reduce
.
What is the proper way to do this?