Here's the data:
df = pd.DataFrame({
'date':[1,1,2,2,2,3,3,3,4,5],
'request':[2,2,2,3,3,2,3,3,3,3],
'users':[1,3,7,1,7,3,4,9,7,9],
'count':[1,1,2,3,1,3,1,2,1,1]
})
df
count date request users
0 1 1 2 1
1 1 1 2 3
2 2 2 2 7
3 3 2 3 1
4 1 2 3 7
5 3 3 2 3
6 1 3 3 4
7 2 3 3 9
8 1 4 3 7
9 1 5 3 9
The idea is to group by count
and date
, and convert every other column to a list of grouped values. I thought this would be as simple as calling dfgp.agg
but it is not.
This is what I want to do:
date request count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
This is how I have done it:
grouped_df = df.groupby(['date', 'request'])
df_new = pd.DataFrame({ 'count' : grouped_df['count'].apply(list), 'users' : grouped_df['users'].apply(list) }).reset_index()
It works but I believe there has to be a better way... one that can work on all columns in the grouped object. For example, I should group by just date
and the solution should work. My solution will rely on hardcoding the columns, that I dislike doing, so it will fail in this instance.
This is a something that has been bothering me. It should be an obvious solution but I cannot find it. Is there a better way?
Calling all my Pandas MVPs...