Is there a way to do a pandas groupby aggregate on a dataframe and returning a certain string from a column? I have a dataframe like so:
lst = [[ 100, 'buicks', .001, np.nan, np.nan], [101, 'chevy', .002, np.nan, np.nan],
[102, 'olds', .003, .006, np.nan], [100, 'buick', np.nan, .004, np.nan],
[103, 'nissan', np.nan, np.nan, .1], [103,'nissans', np.nan, .14, np.nan]]
df = pd.DataFrame(lst, columns=['car_id', 'name', 'aa', 'bb', 'cc'])
car_id name aa bb cc
0 100 buicks 0.001 NaN NaN
1 101 chevy 0.002 NaN NaN
2 102 olds 0.003 0.006 NaN
3 100 buick NaN 0.004 NaN
4 103 nissan NaN NaN 0.1
5 103 nissans NaN 0.140 NaN
Need this:
0 100 buicks 0.001 0.004 NaN
1 101 chevy 0.002 NaN NaN
2 102 olds 0.003 0.006 NaN
4 103 nissans NaN 0.140 0.1
What I am trying to do is groupby the car_id column and then sum columns aa, bb, cc. However, the name column values may not be the same but I need to keep one of them. I don't care which one. I was looking at: Pandas sum by groupby, but exclude certain columns and ended up with something like this:
df.groupby('car_id').agg({'aa': np.sum, 'bb': np.sum, 'cc':np.sum})
But this is dropping the name column. I assume that I can add the name column to the above statement and there is an operation I can put in there to return the string.
Thanks