How can I split a pandas DataFrame into multiple dataframes?

Question

I have a dataframe which consists of 231840 rows. I need to split it into 161 separate tables, each table containing 1440 rows, i.e. the first table contains the first 1440 rows, the second table contains the next 1440 rows and so on until I get 161 separate tables with the combined number of rows being 231840 rows. Any ideas?

Shubham Sharma · Accepted Answer · 2020-06-03T17:23:28.337

7

You can use, np.array_split to split the dataframe:

import numpy as np

dfs = np.array_split(df, 161) # split the dataframe into 161 separate tables

Edit (To assign a new col based on sequential number of df in dfs):

dfs = [df.assign(new_col=i) for i, df in enumerate(dfs, 1)]

edited Jun 03 '20 at 17:23

answered Jun 03 '20 at 17:02

Shubham Sharma

68,127
6
24
53

Thanks a lot! What if I want to create a new column in my initial dataframe, which would show the numbers '1' for the first 1440 rows and so on until '161' how do I apply this function in this case? – Rauan Saturin Jun 03 '20 at 17:20
@RauanSaturin Check i have added that in the answer. – Shubham Sharma Jun 03 '20 at 17:23
since the output of dfs is in the 'list' format, how do I extract this 'new_col' from dfs and make it part of my df dataframe? – Rauan Saturin Jun 03 '20 at 17:55
What do you mean `make it part of df`? Can you explain more? – Shubham Sharma Jun 03 '20 at 17:58
Sure. So for example, 'df' is my initial dataframe with 231840 rows and 10 columns. I want to create a new column in this 'df' dataframe, which would give the numbers sequentially after every 1440th row, i.e. the first 1440 rows have number '1' in that new column, the second 1440 rows have number '2' in that new column, and so on up to '161' for the last 1440 rows. In the answer that you so kindly provided, 'dfs' is in the list format, and I am not able to extract this 'new_col' from it and pass it on to my initial 'df' dataframe. Hope I made it more clear :) – Rauan Saturin Jun 03 '20 at 18:06
How about `df['new_col'] = [i for i, df in enumerate(dfs, 1) for _ in range(len(df))]`? – Shubham Sharma Jun 03 '20 at 18:15

snatchysquid · Answer 2 · 2020-06-03T17:16:46.570

1

simply use

import numpy as np

df_list = np.array_split(df, 3) # replace 3 with the amount of rows you want

In you case you should switch 3 with df(len) // desired_row_amount. We use // to round the result to an integer.
Or go old school and use a for loop, something along the lines of:

rows = 100  # example number of rows
df_list = []  # list to store dfs

for i in range(len(df) // rows):
    if i == len(df) // rows:  # if this is the last part of the df
        df_list.append(df[i*rows:])  # append the dataframe rows left
    else:
# append with a dataframe which has the desired amount of rows
        df_list.append(df[i*rows:(i+1)*rows])

edited Jun 03 '20 at 17:16

answered Jun 03 '20 at 16:58

snatchysquid

1,283
9
24

Thanks a lot! What if I want to create a new column in my initial dataframe, which would show the numbers '1' for the first 1440 rows and so on until '161' how do I apply this function in this case? – Rauan Saturin Jun 03 '20 at 17:20

How can I split a pandas DataFrame into multiple dataframes?

2 Answers2