I have data for all days for multiple years (over a 10 year period from 2004 to 2014). I want to find the max and min for this period for all days in the 10 years eg. What is the max for Jan 1 for all years between 2004 and 2014. How would I do this using resample in pandas? Refer to this image
Asked
Active
Viewed 715 times
2 Answers
0
Use:
rng = pd.date_range('2004-01-01', '2014-12-31')
df = pd.DataFrame({'Date': rng, 'Max': range(len(rng))})
print (df)
Date Max
0 2004-01-01 0
1 2004-01-02 1
2 2004-01-03 2
3 2004-01-04 3
4 2004-01-05 4
... ...
4013 2014-12-27 4013
4014 2014-12-28 4014
4015 2014-12-29 4015
4016 2014-12-30 4016
4017 2014-12-31 4017
[4018 rows x 2 columns]
First convert column Date
to datetimes by to_datetime
, then create custom format MM-DD
by Series.dt.strftime
and last aggregate max
:
df['Date'] = pd.to_datetime(df['Date'])
#if necessary sorting
#df = df.sort_values('Date')
md = df['Date'].dt.strftime('%b-%d')
df = df.groupby(md, sort=False)['Max'].max().reset_index()
print (df)
Date Max
0 Jan-01 3653
1 Jan-02 3654
2 Jan-03 3655
3 Jan-04 3656
4 Jan-05 3657
.. ... ...
361 Dec-27 4013
362 Dec-28 4014
363 Dec-29 4015
364 Dec-30 4016
365 Dec-31 4017
[366 rows x 2 columns]

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
Thanks you @jezrael, I will not post images to SO (new here). The above code doesn't seem to work for me. – Rajat Apr 26 '20 at 05:59
-
I had to change the line to "df['md'] = df['Date'].dt.strftime('%b-%d')" i.e. add md as a column and this worked for me. I don't know why the earlier solution wasn't working but this seems to be working perfectly, thank you. Also, what would be the best way to learn data manipulation using pandas and for dates/datetimes? – Rajat Apr 27 '20 at 17:15
0
I suggest this if you want to have the original dates associated to the max and min values (based on question):
import pandas as pd
import numpy as np
np.random.seed(13)
df = pd.DataFrame({"date":pd.date_range("2004-01-01", freq="D", periods=5000),
"value": np.random.randint(0,100,5000)})
df["day"] = df.date.dt.day
df["month"] = df.date.dt.month
df = df.set_index("date")
idx = df.groupby(['month', 'day'])['value'].transform(max) == df['value']
max_df = df[idx].sort_values(["month", "day"])
idx = df.groupby(['month', 'day'])['value'].transform(min) == df['value']
min_df = df[idx].sort_values(["month", "day"])
And the result is for instance for max_df
:
value day month
date
2010-01-01 88 1 1
2008-01-02 88 2 1
2011-01-03 94 3 1
2009-01-04 98 4 1
2004-01-05 98 5 1
You can have multiple rows for same day and month if there are several maxima.

Raphaele Adjerad
- 1,117
- 6
- 12