I'm using a pandas DataFrame in which one column contains numpy arrays. When trying to sum that column via aggregation I get an error stating 'Must produce aggregated value'.
e.g.
import pandas as pd
import numpy as np
DF = pd.DataFrame([[1,np.array([10,20,30])],
[1,np.array([40,50,60])],
[2,np.array([20,30,40])],], columns=['category','arraydata'])
This works the way I would expect it to:
DF.groupby('category').agg(sum)
output:
arraydata
category 1 [50 70 90]
2 [20 30 40]
However, since my real data frame has multiple numeric columns, arraydata is not chosen as the default column to aggregate on, and I have to select it manually. Here is one approach I tried:
g=DF.groupby('category')
g.agg({'arraydata':sum})
Here is another:
g=DF.groupby('category')
g['arraydata'].agg(sum)
Both give the same output:
Exception: must produce aggregated value
However if I have a column that uses numeric rather than array data, it works fine. I can work around this, but it's confusing and I'm wondering if this is a bug, or if I'm doing something wrong. I feel like the use of arrays here might be a bit of an edge case and indeed wasn't sure if they were supported. Ideas?
Thanks