Overlapping chunks generator function for iterating pandas Dataframes and Series
The chunk function with overlap parameter for control overlapping factor
A generator version of the chunk function with step parameter for control overlapping factor is presented below. Moreover this version works with custom index of the pd.DataFrame or pd.Series (e.g. float type index). For more convenience (to check overlapping), the integer index is used here.
sz = 14
# ind = np.linspace(0., 10., num=sz)
ind = range(sz)
df = pd.DataFrame(np.random.rand(sz,4),
index=ind,
columns=['a', 'b', 'c', 'd'])
def chunker(seq, size, overlap):
for pos in range(0, len(seq), size-overlap):
yield seq.iloc[pos:pos + size]
chunk_size = 6
chunk_overlap = 2
for i in chunker(df, chunk_size, chunk_overlap):
print(i)
chnk = chunker(df, chunk_size, chunk_overlap)
print('\n', chnk, end='\n\n')
print('First "next()":', next(chnk), sep='\n', end='\n\n')
print('Second "next()":', next(chnk), sep='\n', end='\n\n')
print('Third "next()":', next(chnk), sep='\n', end='\n\n')
The output for the overlapping size = 2
a b c d
0 0.577076 0.025997 0.692832 0.884328
1 0.504888 0.575851 0.514702 0.056509
2 0.880886 0.563262 0.292375 0.881445
3 0.360011 0.978203 0.799485 0.409740
4 0.774816 0.332331 0.809632 0.675279
5 0.453223 0.621464 0.066353 0.083502
a b c d
4 0.774816 0.332331 0.809632 0.675279
5 0.453223 0.621464 0.066353 0.083502
6 0.985677 0.110076 0.724568 0.990237
7 0.109516 0.777629 0.485162 0.275508
8 0.765256 0.226010 0.262838 0.758222
9 0.805593 0.760361 0.833966 0.024916
a b c d
8 0.765256 0.226010 0.262838 0.758222
9 0.805593 0.760361 0.833966 0.024916
10 0.418790 0.305439 0.258288 0.988622
11 0.978391 0.013574 0.427689 0.410877
12 0.943751 0.331948 0.823607 0.847441
13 0.359432 0.276289 0.980688 0.996048
a b c d
12 0.943751 0.331948 0.823607 0.847441
13 0.359432 0.276289 0.980688 0.996048
First "next()":
a b c d
0 0.577076 0.025997 0.692832 0.884328
1 0.504888 0.575851 0.514702 0.056509
2 0.880886 0.563262 0.292375 0.881445
3 0.360011 0.978203 0.799485 0.409740
4 0.774816 0.332331 0.809632 0.675279
5 0.453223 0.621464 0.066353 0.083502
Second "next()":
a b c d
4 0.774816 0.332331 0.809632 0.675279
5 0.453223 0.621464 0.066353 0.083502
6 0.985677 0.110076 0.724568 0.990237
7 0.109516 0.777629 0.485162 0.275508
8 0.765256 0.226010 0.262838 0.758222
9 0.805593 0.760361 0.833966 0.024916
Third "next()":
a b c d
8 0.765256 0.226010 0.262838 0.758222
9 0.805593 0.760361 0.833966 0.024916
10 0.418790 0.305439 0.258288 0.988622
11 0.978391 0.013574 0.427689 0.410877
12 0.943751 0.331948 0.823607 0.847441
13 0.359432 0.276289 0.980688 0.996048