0

I have the following dataframe:

    dep jour    incid_hosp  incid_rea   incid_dc    incid_rad
0   01  2020-03-19  1   0   0   0
1   02  2020-03-19  38  8   10  15
2   03  2020-03-19  2   0   0   6
3   04  2020-03-19  1   0   0   1
4   05  2020-03-19  4   0   0   1

... ... ... ... ... ... ...
36052   971 2021-03-10  5   0   2   3
36053   972 2021-03-10  3   0   0   1
36054   973 2021-03-10  1   0   0   5
36055   974 2021-03-10  14  2   1   9
36056   976 2021-03-10  8   0   0   13

What I wish to do is to be able to sum each value in the column 'incid_hosp' for each date. Basically the data is broken down into regions within France, but I only care about the aggregate. What would be the best way to do this?

I tried the following:

cur_date = datetime.today().strftime('%Y-%m-%d')
first_date = '2020-03-19'
date_range = pd.date_range(start=first_date, end=cur_date)

new_fra = pd.DataFrame(index=date_range)
new_fra.reset_index(inplace=True)

for i in date_range:
    new_fra.loc[i] = df_fra[df_fra.jour == i].sum(df_fra['incid_hosp'])

1 Answers1

2

Firstly convert your jour column in datetime dtype by pd.to_datetime() method(if your 'jour' column is already in datetime then ignore this step)

df['jour']=pd.to_datetime(df['jour'])

Just use:-

df.groupby('jour')['incid_hosp'].sum()

or

df.groupby('jour').agg({'incid_hosp':'sum'})
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41