I have a pandas dataframe like the following:
df = pd.DataFrame({'Person_ID': [1,1,1,1,1,1,2,2,2,3,3,3,3],
'Item_ID': [1,1,2,4,4,4,2,3,3,1,2,2,2],
'Value': [1,4,6,5,8,7,3,2,9,8,4,1,2]})
I would like to group this by both person_id and item_id, and then get the average value of each first row. For example: Person_ID 1 is 'associated' with 'Item_ID's 1,2 and 4. The values of the first entries for this Person_ID and these groups are 1,6 and 5 respectively. The average of this would be 4.
Person_ID Item_ID Value
0 1 1 1
1 1 1 4
2 1 2 6
3 1 4 5
4 1 4 8
5 1 4 7
6 2 2 3
7 2 3 2
8 2 3 9
9 3 1 8
10 3 2 4
11 3 2 1
12 3 2 2
Desired outcome:
Person_ID Average_value_first_entries
1 4
2 2.5
3 6
I noticed this question was similar, but a more complex version of this question: Pandas dataframe get first row of each group In this case, instead of grouping by one 'id', I would like to group by two 'id's and take the average.
I tried the following:
df.groupby(['Person_ID', 'Item_ID']).first()['Value']
However, this returns each 'first' entry, but not the averages.
Person_ID Item_ID
1 1 1
2 6
4 5
2 2 3
3 2
3 1 8
2 4
Name: Value, dtype: int64
Any help would be appreciated. I would be especially grateful for 'computationally efficient' (time) solutions, since the real dataset contains 1.2 million rows.