Automate name of new dataframe creation from strings in numpy array given pandas df

Question

I have the following dataframe

      col1          col2          col3
0     str9          47            55
1     str8          43            51
2     str9          46            52
3     str2          42            56

and the following string array generated from df.col1.unique()

strings = ['str9', 'str8', 'str9', 'str2']

I want to create new dataframes to manage the amount of data I'm manipulating where each new dataframe represents df[df.col1 == strings[0]], df[df.col1 == strings[1]], and so on throughout all values in strings

I would like to name them based off of their values in strings too, so we would have

df_str9 = df[df.col1 == strings[0]]

I know I can loop through the string to access each value in strings, but how do I create the dataframe so it has the name requirements as listed?

Something like:

data_file = pd.DataFrame(data = ([['str9', 47, 55], ['str8',  43, 51], ['str9', 46, 52] , ['str2', 42, 56]] ), columns = (['col1', 'col2', 'col3']))
for string in strings:
    df_string = df[df.col1 == string]

This question is pretty broad - try to implement something then come back and ask more specific questions other than `"how do I do this?"` You cannot [dynamically create variable names](https://stackoverflow.com/q/1373164/2823755), the most prevalent solution seems to be keeping everything in a dictionary with the keys being dynamically created. [`DataFrame.groupby('col1')`](https://pandas.pydata.org/pandas-docs/stable/groupby.html) may be what you are looking for. — wwii, Apr 30 '18 at 18:13
Ah this is great, I hadn't considered groupby as a solution, thank you. — dward4, Apr 30 '18 at 18:20

score 1 · Accepted Answer · answered Apr 30 '18 at 18:14

1

You may need locals()

data_file = pd.DataFrame(data = ([['str9', 47, 55], ['str8',  43, 51], ['str9', 46, 52] , ['str2', 42, 56]] ), columns = (['col1', 'col2', 'col3']))

variables = locals()
for i in data_file['col1'].unique():
    variables["df_{0}".format(i)] = data_file.loc[data_file.col1 == i,]

print(df_str9)

print(df_str9)
   col1  col2  col3
0  str9    47    55
2  str9    46    52

answered Apr 30 '18 at 18:14

BENY

317,841
20
164
234

Ahh yes, I guess I shouldn't have said `cannot dynamically create variable names` in comment to the question. – wwii Apr 30 '18 at 18:18
Thank you @Wen nice solution. The comment with the groupby suggestion to make a dictionary would work as well. – dward4 Apr 30 '18 at 18:21
1

@dward4 groupby with dict, is the way achieve this type of case :-) – BENY Apr 30 '18 at 18:34

Automate name of new dataframe creation from strings in numpy array given pandas df

1 Answers1