22

I have a time series that spans a few years, in the following format:

              timestamp open    high    low    close    volume
0   2009-01-02 05:00:00 900.00  906.75  898.00  904.75  15673.0
1   2009-01-02 05:30:00 904.75  907.75  903.75  905.50  4600.0
2   2009-01-02 06:00:00 905.50  907.25  904.50  904.50  3472.0
3   2009-01-02 06:30:00 904.50  905.00  903.25  904.75  6074.0
4   2009-01-02 07:00:00 904.75  905.50  897.00  898.25  12538.0

What would be the simplest way to split that dataframe into multiple dataframes of 1 week or 1 month worth of data?

As an example, a dataframe containing 1 year of data would be split in 52 dataframes containing a week of data and returned as a list of 52 dataframes.

The data can be reconstructed with the code below:

import pandas as pd
from pandas import Timestamp
dikt={'close': {0: 904.75, 1: 905.5, 2: 904.5, 3: 904.75, 4: 898.25}, 'low': {0: 898.0, 1: 903.75, 2: 904.5, 3: 903.25, 4: 897.0}, 'open': {0: 900.0, 1: 904.75, 2: 905.5, 3: 904.5, 4: 904.75}, 'high': {0: 906.75, 1: 907.75, 2: 907.25, 3: 905.0, 4: 905.5}, 'volume': {0: 15673.0, 1: 4600.0, 2: 3472.0, 3: 6074.0, 4: 12538.0}, 'timestamp': {0: Timestamp('2009-01-02 05:00:00'), 1: Timestamp('2009-01-02 05:30:00'), 2: Timestamp('2009-01-02 06:00:00'), 3: Timestamp('2009-01-02 06:30:00'), 4: Timestamp('2009-01-02 07:00:00')}}
df = pd.DataFrame(dikt, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
tdy
  • 36,675
  • 19
  • 86
  • 83
Radar
  • 935
  • 3
  • 9
  • 23
  • When you say "split" how do you want the splits delivered? Do you want them in a dictionary? In another dataframe? You should take the time to specify what it is you expect to see. – piRSquared Jan 12 '17 at 23:46
  • @ piRSquared as an example a dataframe containing 1 year of data would be split in 52 dataframes of a week and returned as a list of 52 dataframes. – Radar Jan 13 '17 at 00:07

6 Answers6

31

The pd.TimeGrouper is deprecated and will be removed, you can use pd.Grouper instead.

weeks = [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='W'))]
months = [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='M'))]

This way you can also avoid setting the timestamp as index.


Also, if your timestamp is part of a multi index, you can refer to it using using the level parameter (e.g. pd.Grouper(level='timestamp', freq='W')). Than @jtromans for the heads up.

toto_tico
  • 17,977
  • 9
  • 97
  • 116
  • 2
    If you are using a multi index for your timestamp, you can refer to the appropriate level pd.Grouper(level='TM_ID',freq='M') where in my case TM_ID is the appropriate datetime representation. – jtromans Oct 26 '18 at 14:28
15

use groupby with pd.TimeGrouper and list comprehensions

weeks = [g for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('W'))]
months = [g for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('M'))]

You can reset the index if you need

weeks = [g.reset_index()
         for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('W'))]
months = [g.reset_index()
          for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('M'))]

in a dict

weeks = {n: g.reset_index()
         for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('W'))}
months = {n: g.reset_index()
          for n, g in df.set_index('timestamp').groupby(pd.TimeGrouper('M'))}
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Fantastic answer, spot on! Thanks a lot. I knew there was a way to make it very concise. I've been trying to understand better the groupby method, but 1 of the difficulties I encounter is that it is not possible to inspect what's inside the groupBy object itself... Why is it so? Or is there a way to actually do that? – Radar Jan 13 '17 at 00:40
  • The groupby object is a device to help with these grouping activities. It holds information about where to split the dataframe, the unique names for each group, and hand methods. It isn't really a data structure on its own (I'm sure you could interpret it as one). By iterating through it the way I did, the groupby class tells it to yield the name of the group and the slice of the dataframe for each name. That's why I asked how you wanted it. – piRSquared Jan 13 '17 at 00:44
  • Where can the documentation about the 'hand methods' for the groupby object be found? The documentation page doesn't list them: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html – Radar Jan 13 '17 at 00:50
  • [this is a better reference](http://pandas.pydata.org/pandas-docs/stable/groupby.html) – piRSquared Jan 13 '17 at 00:53
  • Also, could you explained a bit more the expression `g for n, g in ...` . Are there 2 elements returned by the for loop applied to the groupby objet? – Radar Jan 14 '17 at 18:16
  • Is the objet you are iterating on the dictionary that is visible when doing `df.set_index('timestamp').groupby(pd.TimeGrouper('M')).groups` ? – Radar Jan 14 '17 at 18:32
  • Great answer. Btw you can also use `df.set_index('timestamp').resample("W")` instead of `groupby` + `Grouper`! – meow Jul 02 '19 at 19:28
2

I would use group by for this, assume df stores the data

df = df.set_index('timestamp')
df.groupby(pd.TimeGrouper(freq='D'))

then resulting groups would contain all the dataframes you are looking for. this answer is referenced here

How to group DataFrame by a period of time?

Alex
  • 12,078
  • 6
  • 64
  • 74
coredump
  • 41
  • 6
1

Convert the timestamp column into DateTimeIndex, then you can slice into it in a variety of ways.

rtkaleta
  • 681
  • 6
  • 14
0

The concept of TimeGrouper is correct, but the syntax doesn't seem to be working with latest versions on pandas. Here's my working code on Pandas 1.1.3

df_Time = df.copy()
df_Time = df_Time.groupby(pd.Grouper(key='time', freq='M')).agg({
    'polarity': 'mean',
})

The pd.Grouper(key='time', freq='M') is what you need. key is the column where the time/timestamp exists and the freq can take multiple values with very useful options. The full list of Offset aliases (frequency options) can be found here

The main ones are

B: business day frequency
C: custom business day frequency
D: calendar day frequency
W: weekly frequency
M: month end frequency
Rami Alloush
  • 2,308
  • 2
  • 27
  • 33
0

This should fix it. Load your data and parse the date

import pandas as pd
data = pd.read_csv(f"../Data/2022/2022_02.csv", 
                    delimiter=',', parse_dates=["Timestamp"])

You can add date_parser=pd.to_datetime to parse the dates as dateTime

weeks = [week for stamp, week in data.resample("W")]
months = [month for stamp, month in data.resample("M")]

In the weeks array, each item is a pandas dataframe (same for the month).

You can view it by using weeks[0]

Vizier
  • 11
  • 5