Splitting dataframe column into equal windows in Pandas

Question

I have a dataframe like the following and I intend to extract windows with size = 30 and then write for loop for each block of data and call other functions.

index = pd.date_range(start='2016-01-01', end='2016-04-01', freq='D')
data = pd.DataFrame(np.random.rand(len(index)), index = index, columns=['random'])

I found the following function, but I wonder if there is more efficient way to do so.

def split(df, chunkSize = 30): 
    listOfDf = list()
    numberChunks = len(df) // chunkSize + 1
    for i in range(numberChunks):
        listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
    return listOfDf

Can you please fix the indentation? – jotasi Jul 25 '17 at 12:44 — jotasi, Jul 25 '17 at 12:44

Scott Boston · Accepted Answer · 2017-07-25T12:47:45.317

8

You can use list comprehension. See this SO Post about how access dfs and another way to break up a dataframe.

n = 200000  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]

edited Jul 25 '17 at 12:47

answered Jul 25 '17 at 12:46

Scott Boston

147,308
15
139
187

A bit of added information: if your windows do not evenly divide your dataset (i.e. len(df)%n > 0), the last window will be smaller. If that causes an issue, the @jdehesa solution evenly distributes the extra datapoints on the last (len(df)%n) windows such that every window length is either n or n-1 (but of course requires the use of numpy). – Andy K. Nov 28 '18 at 07:54

score 5 · Answer 2 · answered Jul 25 '17 at 12:53

You can do it efficiently with NumPy's array_split like:

import numpy as np

def split(df, chunkSize = 30):
    numberChunks = len(df) // chunkSize + 1
    return np.array_split(df, numberChunks, axis=0)

Even though it is a NumPy function, it will return the split data frames with the correct indices and columns.

Splitting dataframe column into equal windows in Pandas

2 Answers2