0

I have a dataframe like this.

A,B
1,2
3,4
5,6
7,8
9,10
11,12
13,14

I would like to split this above dataframe. The splitted dataframe should contains every three rows. The first dataframe splitted can contain from index 0 to index 2. Second contains from index 1 to index and so on.

A,B
1,2
3,4
5,6

A,B
3,4
5,6
7,8

A,B
5,6
7,8
9,10

and so on.

I have been using forloop and then using the iloc and then adding those splitted dataframe into the list.

I am looking if there is some vectorized method to split that above dataframe in pandas. The dataframe is huge and using forloop through each rows is quite slow.

user96564
  • 1,578
  • 5
  • 24
  • 42

1 Answers1

1

Assuming you have standard RangeIndex indexes and borrowing a vectorized approach for a rolling window from here, we can get down to numpy's level and:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

df.to_numpy()[rolling_window(df.index.values, 3)]

which yields

array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 3,  4],
        [ 5,  6],
        [ 7,  8]],

       [[ 5,  6],
        [ 7,  8],
        [ 9, 10]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]],

       [[ 9, 10],
        [11, 12],
        [13, 14]]])

If you need these as data frames back, just use the constructor and a map

map(pd.DataFrame, df.to_numpy()[rolling_window(df.index.values, 3)])
rafaelc
  • 57,686
  • 15
  • 58
  • 82