Split larger data frame into multiple small data frames

Question

I have a dataframe a dataframe df with dimension of (28260,25) Now I wanted to distribute this dataframe into 20 small dataframes each with dimension (1413,25) with names like df_1, df_2 ....df_20

For Example: Input Dataframe

frames={}
for e,i in enumerate(np.split(df,20)):
    frames.update([('df'+str(e+1),pd.DataFrame(np.random.permutation(i),columns=df.columns))])

I don't believe that the question cited as a duplicate is the same. It doesn't mention pandas dataframes at all. Can that tag be removed? @cs95 — Chris Farr, May 22 '19 at 19:56
@ChrisFarr The premise is the same, they are trying to dynamically introduce multiple variables into the global namespace. The solution is to use a dictionary in either case, regardless of whether they're Dataframes or not. — cs95, May 22 '19 at 19:57

Alexandre B. · Accepted Answer · 2019-05-22T18:55:31.417

If you want to keep all the dataframe in a dict, here is one way to do:

# import modules
import pandas as pd
import numpy as np


# Create dataframe of 25 columns and 28260 rows
df = pd.DataFrame({"col_"+str(i): np.random.randint(0, 10, 28260)
                   for i in range(25)})
print(df.head(5))
#    col_0  col_1  col_2  col_3  col_4  col_5  col_6  col_7  col_8  ...  col_16  col_17  col_18  col_19  col_20  col_21  col_22  col_23  col_24
# 0      5      0      1      5      9      7      2      9      5  ...       5       1       3       8       2       3       9       7       4
# 1      7      1      5      0      2      1      5      9      6  ...       6       1       1       7       8       7       0       2       1
# 2      0      3      6      1      3      8      7      4      7  ...       9       9       7       7       8       9       1       6       9
# 3      7      7      3      3      3      1      3      4      9  ...       2       2       7       9       8       0       2       0       8
# 4      0      1      3      9      7      4      4      3      8  ...       9       5       8       4       5       4       3       9       6


print("Dimension df: ", df.shape)
# Dimension:  (28260, 25)

# Create dict of sub dataframe
dict_df = {"df_"+str(i): df.iloc[i*28260//20:(i+1)*28260//20] for i in range(20)}
print("Keys: ", dict_df.keys())
# Keys:  dict_keys(['df_0', 'df_1', 'df_2', 'df_3', 'df_4', 'df_5', 'df_6', 'df_7', 'df_8',
#                   'df_9', 'df_10', 'df_11', 'df_12', 'df_13', 'df_14', 'df_15', 'df_16',
#                   'df_17', 'df_18', 'df_19'])

print("Size of each sub_dataframe: ", dict_df["df_1"].shape)
# Size of each sub_dataframe:  (1413, 25)

And in a list:

# List of sub dataframes
list_df = []
for i in range(20):
    list_df.append(df.iloc[i*28260//20:(i+1)*28260//20])

print("Number of sub_dataframes: ", len(list_df))
# Number of sub_dataframes: 20
print("Size of each sub_dataframe: ", list_df[0].shape)
# Size of each sub_dataframe: (1413, 25)

I don't want to store the data frames in list...rather i want to 20 data frames in environment with names df_1 ....df_20 — vinodetrx, May 22 '19 at 18:56
If you want to get them through a loop, you need to save them in a stucture. A `list` or a `dict`is a solution. But you can not define dynamically variable through a loop. If you want to have 20 variable `df_1, df_2, ... df_20`, you have to write them manually. — Alexandre B., May 22 '19 at 19:00

Split larger data frame into multiple small data frames

1 Answers1