Separating a dataframe by date and calculating averages Numpy Python

Question

The data_list and the monthly_values array are in correlation with each other, so the data point '2019-09-01 00:00:00'= 15 , 2019-10-01 00:00:00'= 39.6... etc. The year_changes function below shows the indexes where a new year has occurred. I am trying to code a function that shows me the average values of all the monthly values within the given year. So since there are 4 months present in 2019 2019-09-01 00:00:00 - 2020-01-01 00:00:00 it takes in the sum of the numbers 15., 39.6, 0.2, 34.3 and divides by the number of months in 2019 which is 4 resulting in the Expected Output of 22.28. How would I be able to code such a thing?

import datetime
import numpy as np
import pandas as pd
from pandas import DataFrame

date_list = ['2019-09-01 00:00:00', '2019-10-01 00:00:00', '2019-11-01 00:00:00',
 '2019-12-01 00:00:00', '2020-01-01 00:00:00', '2020-02-01 00:00:00', 
 '2020-03-01 00:00:00', '2020-04-01 00:00:00', '2020-05-01 00:00:00', 
 '2020-06-01 00:00:00', '2020-07-01 00:00:00', '2020-08-01 00:00:00',
 '2020-09-01 00:00:00','2020-10-01 00:00:00', '2020-11-01 00:00:00', 
 '2020-12-01 00:00:00','2021-01-01 00:00:00','2021-02-01 00:00:00', '2021-03-01 00:00:00', 
 '2021-04-01 00:00:00','2021-05-01 00:00:00', '2021-06-01 00:00:00', 
 '2021-07-01 00:00:00']
monthly_values = np.array([ 15., 39.6, 0.2, 34.3, 19.6, 26.8, 15.7, 26., 12.6, 15.5, 18.6, 2.3, 6.5,
   2.5, 12.2, 11.6, 93.9, 25.5, 26.5, -16.5, -1.4, -1.8, 5.])

data = DataFrame (date_list,columns=['Data'])
datetime = pd.to_datetime(data['Data'])

year_changes = data.loc[np.where(datetime.dt.year.diff().gt(0))].index.tolist()

Expected Output Yearly Values:

2019 Average: 22.28
2020 Average: 14.16
2021 Avreage: 21.03

score 1 · Accepted Answer · answered Jul 04 '21 at 20:18

You can create dataframe from date_list and monthly_values:

data = pd.DataFrame({"Date": date_list, "Values": monthly_values})
data["Date"] = pd.to_datetime(data["Date"])

Prints:

         Date  Values
0  2019-09-01    15.0
1  2019-10-01    39.6
2  2019-11-01     0.2
3  2019-12-01    34.3
4  2020-01-01    19.6
5  2020-02-01    26.8
6  2020-03-01    15.7
7  2020-04-01    26.0
8  2020-05-01    12.6
9  2020-06-01    15.5
10 2020-07-01    18.6
11 2020-08-01     2.3
12 2020-09-01     6.5
13 2020-10-01     2.5
14 2020-11-01    12.2
15 2020-12-01    11.6
16 2021-01-01    93.9
17 2021-02-01    25.5
18 2021-03-01    26.5
19 2021-04-01   -16.5
20 2021-05-01    -1.4
21 2021-06-01    -1.8
22 2021-07-01     5.0

Then using .groupby with .dt.year as a grouper:

print(data.groupby(data["Date"].dt.year).mean())

Prints:

         Values
Date           
2019  22.275000
2020  14.158333
2021  18.742857

Hello I have another question that is related to this issue if you could take a look at it I would appreciate it: https://stackoverflow.com/questions/68251671/separating-a-dataframe-by-date-and-calculating-mathmetical-models-numpy-python — tony selcuk, Jul 05 '21 at 06:40

Separating a dataframe by date and calculating averages Numpy Python

1 Answers1