I have a dataframe which consists of 231840 rows. I need to split it into 161 separate tables, each table containing 1440 rows, i.e. the first table contains the first 1440 rows, the second table contains the next 1440 rows and so on until I get 161 separate tables with the combined number of rows being 231840 rows. Any ideas?
Asked
Active
Viewed 1.1k times
2 Answers
7
You can use, np.array_split
to split the dataframe:
import numpy as np
dfs = np.array_split(df, 161) # split the dataframe into 161 separate tables
Edit (To assign a new col based on sequential number of df in dfs
):
dfs = [df.assign(new_col=i) for i, df in enumerate(dfs, 1)]

Shubham Sharma
- 68,127
- 6
- 24
- 53
-
Thanks a lot! What if I want to create a new column in my initial dataframe, which would show the numbers '1' for the first 1440 rows and so on until '161' how do I apply this function in this case? – Rauan Saturin Jun 03 '20 at 17:20
-
@RauanSaturin Check i have added that in the answer. – Shubham Sharma Jun 03 '20 at 17:23
-
since the output of dfs is in the 'list' format, how do I extract this 'new_col' from dfs and make it part of my df dataframe? – Rauan Saturin Jun 03 '20 at 17:55
-
What do you mean `make it part of df`? Can you explain more? – Shubham Sharma Jun 03 '20 at 17:58
-
Sure. So for example, 'df' is my initial dataframe with 231840 rows and 10 columns. I want to create a new column in this 'df' dataframe, which would give the numbers sequentially after every 1440th row, i.e. the first 1440 rows have number '1' in that new column, the second 1440 rows have number '2' in that new column, and so on up to '161' for the last 1440 rows. In the answer that you so kindly provided, 'dfs' is in the list format, and I am not able to extract this 'new_col' from it and pass it on to my initial 'df' dataframe. Hope I made it more clear :) – Rauan Saturin Jun 03 '20 at 18:06
-
How about `df['new_col'] = [i for i, df in enumerate(dfs, 1) for _ in range(len(df))]`? – Shubham Sharma Jun 03 '20 at 18:15
1
simply use
import numpy as np
df_list = np.array_split(df, 3) # replace 3 with the amount of rows you want
In you case you should switch 3
with df(len) // desired_row_amount
. We use //
to round the result to an integer.
Or go old school and use a for
loop, something along the lines of:
rows = 100 # example number of rows
df_list = [] # list to store dfs
for i in range(len(df) // rows):
if i == len(df) // rows: # if this is the last part of the df
df_list.append(df[i*rows:]) # append the dataframe rows left
else:
# append with a dataframe which has the desired amount of rows
df_list.append(df[i*rows:(i+1)*rows])

snatchysquid
- 1,283
- 9
- 24
-
Thanks a lot! What if I want to create a new column in my initial dataframe, which would show the numbers '1' for the first 1440 rows and so on until '161' how do I apply this function in this case? – Rauan Saturin Jun 03 '20 at 17:20