split a dataframe based on column criteria

Question

Hey guys i'm trying to split a dataframe into several subsets by comparing it to a list.

combined = []
for i in df['reg_sch_cur'].unique():
    combined.append(i)
#creates a list with column 44 uniques

#split df
for i in combined:
     df= df[df['reg_sch_cur'] == i]

This, unfortunately, only saves me the last iteration. I would like to store in cash every dataframe splitted (44) so i'm assuming that on df = df[df['reg_sch_cur'] == i] i have to add something do save the df with their multiple names

Splitting dataframe on unique values of 'reg_sch_cur' is the same as grouping by the value. Either as `list` -> `dfs = [x for _, x in df.groupby('reg_sch_cur')]` or as `dict` -> `d = dict(tuple(df.groupby('reg_sch_cur')))` — Henry Ecker, Jul 22 '21 at 14:20

score 0 · Answer 1 · answered Jul 22 '21 at 14:13

You can create an empty dictionary and save your new generated df under a new key in it, during your second loop:

dict_dfs = dict()
for i in combined:
    dict_dfs[f"new_key_{i}"] = df[df['reg_sch_cur'] == i]

Afterwards you just get the df with the key.

score 0 · Answer 2 · answered Jul 22 '21 at 14:22

Looks like a case for DataFrame.groupby().

import pandas as pd

df = pd.DataFrame(dict(a=[1,1,2,2,3,3], b=range(6)))
df

#    a  b
# 0  1  0
# 1  1  1
# 2  2  2
# 3  2  3
# 4  3  4
# 5  3  5

grouped = df.groupby('a')

for a, subset in grouped:
    print(a, "\n", subset, "\n")

# 1 
#     a  b
# 0  1  0
# 1  1  1 

# 2 
#     a  b
# 2  2  2
# 3  2  3 

# 3 
#     a  b
# 4  3  4
# 5  3  5

split a dataframe based on column criteria

2 Answers2