0

I have a dataframe (mydata) that contains more than 10000 samples over 4 three years (from 2016-01-23 to 2019-10-12) including a "date" as one of the features. I want to partition the dataframe (mydata) into 4 dataframes based on the seasons (Spring, Summer, fall and winter). Here is my data frame:

 mydata:
     f1,   f2,    f3,      date
s1   23,  2.5,  0.3,  2016-04-03
s2   03,  0.5,  1.3,  2017-08-01
s3   14,   4,   2.3,  2016-10-02
....
sn   09,  4.3,   32,  2019-03-03

So me desired output should be four data frames (spring, summer,..). For example, the data which are logged during spring in the four seasons should be placed in the first dataframe (Spring), and so on:

 Spring=[the data which are logged during the spring (from march 1 to May 31) during the four years]
 Summer=[the data which are logged during the spring (fro June 1 to August 31) during the four years]
 .....

I could handle it manually something like this, e.g., for one season, but I want a more efficient way:!

 season1=pd.DataFrame()
 season1=season1.append(mydata[(mydata['date']>'2016-03-01') & (mydata['date']<'2016-05-31') ])
 season1=season1.append(mydata[(mydata['date']>'2017-03-01') & (mydata['date']<'2017-05-31') ])
 season1=season1.append(mydata[(mydata['date']>'2018-03-01') & (mydata['date']<'2018-05-31') ])
 season1=season1.append(mydata[(mydata['date']>'2019-03-01') & (mydata['date']<'2019-05-31') ])
Spedo
  • 355
  • 3
  • 13
  • 1
    Does this answer your question? [Determine season given timestamp in Python using datetime](https://stackoverflow.com/questions/16139306/determine-season-given-timestamp-in-python-using-datetime) – LoicM Jul 28 '20 at 14:07
  • I checked it, but not really. – Spedo Jul 28 '20 at 14:17

1 Answers1

3

Quickly creating something like your data set

import pandas as pd

variables = np.random.randn(1300,3)
time = pd.date_range("2016-01-01", periods=1300, name='date')
df = pd.DataFrame(variables, columns=['f1','f2','f3'], index=time).reset_index()

looks like:

           date        f1        f2        f3
0    2016-01-01 -0.234615  0.671180  0.423316
1    2016-01-02 -0.900134 -0.021248 -0.608107
2    2016-01-03 -1.558302 -0.063307  0.578215
3    2016-01-04  0.474513  1.787985  0.929357
4    2016-01-05 -0.734408 -0.965413 -1.521657
        ...       ...       ...       ...
1295 2019-07-19  0.774643 -1.108196 -1.043404
1296 2019-07-20  0.645087 -2.107540 -1.054049
1297 2019-07-21 -1.126800  1.265989  0.298515
1298 2019-07-22 -0.501056  1.137609  1.344562
1299 2019-07-23 -0.409044  0.362831  0.988417

[1300 rows x 4 columns]

Then you can just take a subset of df (e.g. all January's) by

df.loc[(df.date.dt.month==1)]

In order to get several months, just stack the conditions (e.g. all Jan's and Feb's)

df.loc[(df.date.dt.month==1) | (df.date.dt.month==2)]

Finally, to get some more flexibility, define a function for 3 arbitrary months

def getMonths(input, m1, m2, m3):
    return input.loc[(input.date.dt.month==m1) | (input.date.dt.month==m2) | (input.date.dt.month==m3)]

For example:

Spring = getMonths(df,3,4,5)
nicrie
  • 199
  • 7