Creating a Pandas DataFrame by splitting the original DF on different caegories

Question

I have a DateFrame with 'Break_Out_Category' as a column. This column contains four variables ['Age Group', 'Race/Ethnicity', 'Gender', 'Overall']. Now I am trying to create four different DataFrames for those different variables. Ex:

df_by_age = df[df['Break_Out_Category'] == 'AGE Group']

Although, I don't want to hardcode and I am trying to write a loop for the same. Here is my code:

var_list = data_by_avg_days1['Break_Out_Category'].unique().tolist()

for var in var_list:
   activity_limit_by_%var =       data_by_avg_days1[data_by_avg_days1['Break_Out_Category'] == var]
   print(activity_limit_by_%var['Break_Out_Category'].unique())

And this is the error I get

Error:     activity_limit_by_%var = pd.DataFrame
   ^
SyntaxError: can't assign to operator

This is my first post here. So if I haven't posted the question in the right format. Do let me know, what I can do to ask in a better way.

You can't use `%` in a variable name. Try changing `activity_limit_by_%var` to `activity_limit_by_percent_var`. — pault, Jan 08 '18 at 18:49
This creates the name of the DF as activity_limit_by_percent_var, without considering var as a pointer. The resulting DF has values only from the final iteration. — aRad, Jan 08 '18 at 20:40

score 2 · Accepted Answer · answered Jan 08 '18 at 18:50

The essence of creating "variable numbers of variables" is using a dictionary. While this could be closed as a duplicate of How do I create a variable number of variables?, you should know that there's a better way to do this.

One simple way of dividing your dataframe by category is to use a groupby, iterate over each group and load it into a dataframe.

d = {}
for i, g in data_by_avg_days1.groupby('Break_Out_Category', as_index=False):
    d[i] = g

You could also do this with a dict comprehension -

d = {i : g for i, g in data_by_avg_days1.groupby('Break_Out_Category', as_index=False)}

d is a dict of mappings, which maps a string value to its corresponding dataframe. Now, access the dataframe for, example, AGE Group using d['AGE Group'].

Here's a quick example with some sample data -

df

   A  B
0  a  1
1  a  1
2  a  2
3  b  2
4  b  3

d = {i : g for  i, g in df.groupby('A', as_index=False)}

d['a']

   A  B
0  a  1
1  a  1
2  a  2


d['b']

   A  B
3  b  2
4  b  3

Note that if you want to reset the index for each group, you can modify your comprehension a little, and add a reset_index call -

d = {i : g.reset_index() for  i, g in df.groupby('A', as_index=False)}

Creating a Pandas DataFrame by splitting the original DF on different caegories

1 Answers1