-1

I have a pandas dataframe which has the folowing columns - Day, Month, Year, City, Temperature.

I would like to have a new column that has the average (mean) temperature in same date (day\month) of all previous years.

Can someone please assist?

Thanks :-)

N S
  • 13
  • 2
  • 1
    Use `groupby` and `mean`. **You're welcome :)** – Corralien Jan 29 '22 at 13:21
  • Thanks for your answer, but as far as I understand it is not the solution for me. I would like that in each record, there will be one more colums that will have the mean of the temperatures in same date in the history. ie. in 15/2010 there will be the temperature of this date (exist) and also the average of all 15/2 prior to this year and not after. As far as I know - group by will not help – N S Jan 29 '22 at 13:26
  • No. I have 26 years of daily temperatures. Want to have a new column that has the average temperature in same date in previous years for using in forecasting with this new data. I want to have it for all my 3M records – N S Jan 29 '22 at 13:34
  • 1
    To be serious, StackOverflow is not a coding service platform. We can help you not to do the job for you... [Take a tour](https://stackoverflow.com/tour), [How to ask](https://stackoverflow.com/help/how-to-ask) and provide a [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) especially for [Pandas question](https://stackoverflow.com/q/20109391/15239951) – Corralien Jan 29 '22 at 13:34
  • Rolling looks like the answer - thanks. – N S Jan 29 '22 at 13:41
  • 1
    No, it's not. the right solution is to use `expanding`. I will an answer but please next time provide input data, expected output and the code you have try so far. – Corralien Jan 29 '22 at 13:43

1 Answers1

1

Try:

dti = pd.date_range('2000-1-1', '2021-12-1', freq='D')
temp = np.random.randint(10, 20, len(dti))
df = pd.DataFrame({'Day': dti.day, 'Month': dti.month, 'Year': dti.year,
                   'City': 'Nice', 'Temperature': temp})

out = df.set_index('Year').groupby(['City', 'Month', 'Day']) \
        .expanding()['Temperature'].mean().reset_index()

Output:

>>> out
      Day  Month  Year  City  Temperature
0       1      1  2000  Nice    12.000000
1       1      1  2001  Nice    12.000000
2       1      1  2002  Nice    11.333333
3       1      1  2003  Nice    12.250000
4       1      1  2004  Nice    11.800000
...   ...    ...   ...   ...          ...
8001   31     12  2016  Nice    15.647059
8002   31     12  2017  Nice    15.555556
8003   31     12  2018  Nice    15.631579
8004   31     12  2019  Nice    15.750000
8005   31     12  2020  Nice    15.666667

[8006 rows x 5 columns]

Focus on 1st January of the dataset:

>>> df[df['Day'].eq(1) & df['Month'].eq(1)]
      Day  Month  Year  City  Temperature   # Mean 
0       1      1  2000  Nice           12   # 12
366     1      1  2001  Nice           12   # 12
731     1      1  2002  Nice           10   # 11.33
1096    1      1  2003  Nice           15   # 12.25
1461    1      1  2004  Nice           10   # 11.80
1827    1      1  2005  Nice           12   # and so on
2192    1      1  2006  Nice           17
2557    1      1  2007  Nice           16
2922    1      1  2008  Nice           19
3288    1      1  2009  Nice           12
3653    1      1  2010  Nice           10
4018    1      1  2011  Nice           16
4383    1      1  2012  Nice           13
4749    1      1  2013  Nice           15
5114    1      1  2014  Nice           14
5479    1      1  2015  Nice           13
5844    1      1  2016  Nice           15
6210    1      1  2017  Nice           13
6575    1      1  2018  Nice           15
6940    1      1  2019  Nice           18
7305    1      1  2020  Nice           11
7671    1      1  2021  Nice           14
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • I am grateful for your answer - It helped me alot! would be happy to understand how not to include the current year in the average calculation (as I need only the previous years) – N S Jan 29 '22 at 20:41
  • It's pretty simple, just shift the values forward: Use `.mean().shift().reset_index()` instead of `.mean().reset_index()`. I'm sure you understand why :) – Corralien Jan 29 '22 at 21:06