0

Let's assume I have a pandas dataframe like this:

import pandas as pd
d = {'entrance': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'e', 'e', 'e', 'e'], 
      'hour': [3, 4, 4, 4, 2, 2, 2, 2, 0, 0, 0, 1, 12, 12, 12, 12, 7, 7, 8, 8],
      'in': [3, 2, 0, 4, 3, 2, 1, 0, 2, 1, 5, 0, 1, 1, 2, 3, 10, 10, 1, 5],
      'out': [0, 1, 0, 0, 1, 2, 1, 3, 1, 1, 0, 2, 0, 2, 3, 0, 0, 0, 25, 0]}
df = pd.DataFrame(data=d)
df

    entrance    hour    in  out
    a           3       3   0
    a           4       2   1
    a           4       0   0
    a           4       4   0
    b           2       3   1
    b           2       2   2
    b           2       1   1
    b           2       0   3
    c           0       2   1
    c           0       1   1
    c           0       5   0
    c           1       0   2
    d           12      1   0
    d           12      1   2
    d           12      2   3
    d           12      3   0
    e           7       10  0
    e           7       10  0
    e           8       1   25
    e           8       5   0

I would like to sum the values in the columns in and out grouped by hour. I know I can achieve this with:

df_grouped = df.groupby(['hour']).agg({'in':'sum', 'out': 'sum'}).reset_index()

However, the values in the columns in and out should only be grouped by the first hour record of each entrance group. So for entrance group 'a', the values in and out should be summed for the first record of the corresponding hours, which is in this case hour '3'. The desired output should be:

hour   in    out
3      9     1
2      6     7
0      8     4
12     7     5
7      27    25

What is the best way to do this?

sampeterson
  • 459
  • 4
  • 16
  • 1
    Need `df_grouped = df.groupby(['entrance']).agg({'hour':'first', 'in':'sum', 'out': 'sum'}).reset_index(drop=True)` ? – jezrael Oct 03 '22 at 12:44

0 Answers0