creating multiple dataframes from looping over a column in a df

Question

I am looking to create multiple dataframes from one original dataframe. I want to loop through one column and create dataframes on its matches.

   TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
0        123456             test_test              email  123456       paul
1        234567             test_test              email  123456       paul
2        345678             test_test              email  123456       paul
3        456789             test_test              email  123456       paul
4        123456        test_test_test              video  987654      karol
5        234567        test_test_test              video  987654      karol
6        345678        test_test_test              video  987654      karol
7        456789        test_test_test              video  987654      karol

This is my original df and I want to create a new df of every match in TargetListID I have looked through :

Create multiple dataframes in loop

And tried the following:

def create_df_from_target_list_id(dataframe):
    dataframe = {target_list_id: pd.DataFrame() for target_list_id in dataframe['TargetListID']}
    return dataframe


test = create_df_from_target_list_id(df)
print(test)

Which gives me:

{123456: Empty DataFrame
Columns: []
Index: [], 234567: Empty DataFrame
Columns: []
Index: [], 345678: Empty DataFrame
Columns: []
Index: [], 456789: Empty DataFrame
Columns: []
Index: []}

So not sure what I am exactly doing wrong here? Any pointers would be create. The reason for this is because the original dataframe could have 1000s rows. So would like to create dataframes without knowing the TargetListId

I tried groupby here:

def create_df_from_target_list_id(dataframe):
    dataframe = dict(iter(dataframe.groupby('TargetListID')))
    return dataframe


test = create_df_from_target_list_id(df)
print(test)

and got the following

{123456:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
0        123456             test_test              email  123456       paul
4        123456        test_test_test              video  987654      karol, 234567:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
1        234567             test_test              email  123456       paul
5        234567        test_test_test              video  987654      karol, 345678:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
2        345678             test_test              email  123456       paul
6        345678        test_test_test              video  987654      karol, 456789:    TargetListID Placement_Description Buy_Component_Type  NPI_ID First_Name
3        456789             test_test              email  123456       paul
7        456789        test_test_test              video  987654      karol}

score 0 · Answer 1 · answered Jan 11 '22 at 18:38

0

It looks like your approach is not far off, but notice that in your function you are calling pd.DataFrame() which creates a DataFrame but gives it no data to use (hence the empty DataFrames).

If you want a DataFrame for each unique value of TargetListID, and your original DataFrame is called df, then you could do something like this:

df_dictionary = { target_id: df[ df.TargetListID == target_id ] for target_id in df.TargetListID.unique() }

You could do this inside your function or separately.

Note that you may want to first drop any null TargetListID values and you may want to use reset_index() when creating your dictionary depending on what you need.

answered Jan 11 '22 at 18:38

Leo

446
2
9

this kinda works, i have one df with a group by kinda of situation. I want to be able to 4 different dataframes – mrpbennett Jan 11 '22 at 18:46
This will create a dictionary with a key for each unique TargetListID value, and a value that is a DataFrame made of the rows in your original DataFrame that match that TargetListID value. If you have 4 unique TargetListID values you would have a dictionary with four (key,value) pairs, i.e. 4 DataFrames. – Leo Jan 11 '22 at 18:49
but with a dict, wouldnt i need to know the first key to be able to access that? I want something where I have a grouped df that I can then just send some where to an API. I wont always know the keys – mrpbennett Jan 11 '22 at 19:08
You can always access the keys with `df_dictionary.keys()`. You could also store the DataFrames in a list with `df_list = [ df[ df.TargetListID == target_id ] for target_id in df.TargetListID.unique() ]` if you don't want to use a dictionary. – Leo Jan 11 '22 at 19:48

score -1 · Answer 2 · answered Jan 11 '22 at 19:17

-1

I think you are looking for a plain and simple groupby.

r = {}
for target_id, df in in dataframe.groupby('TargetListID'):
    r[target_id] = df

answered Jan 11 '22 at 19:17

alec_djinn

10,104
8
46
71

creating multiple dataframes from looping over a column in a df

2 Answers2