How can I create a multiple new dataframes inside a for loop?

Question

I want to create a for_loop that doesn't overwrite the exiting dataframe?

for df in 2011, 2012, 2013:
       df = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

Right now the for loop above iterates over each of the existing dataframes. How can I make it so the for loop creates a bunch of new dataframes?

2011_pivot, 2012_pivot, 2013_pivot

So final output would be three dataframes or one dataframe with all the previous dataframes concatenated? — mad_, Oct 01 '18 at 20:04
You should use a dict to save the dataframes you are creating, where "2011_pivot", "2012_pivot" and "2013_pivot" are the keys. — brunormoreira, Oct 01 '18 at 20:09
https://stackoverflow.com/a/52457013/10292170 https://stackoverflow.com/a/52508030/10292170 — ipramusinto, Oct 01 '18 at 20:29
Did an answer below help? If so, feel free to [accept](https://stackoverflow.com/help/accepted-answer) one, or ask for clarification. — jpp, Oct 03 '18 at 22:50

score 3 · Answer 1 · answered Oct 01 '18 at 20:10

I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'

df_dict = dict()
for df in 2011, 2012, 2013:
   df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

I'm assuming here that your dataframes have the names "2011", "2012", "2013"

score 1 · Answer 2 · answered Oct 01 '18 at 20:07

I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.

df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013]

You can find an example here.

score 0 · Answer 3 · answered Oct 01 '18 at 20:17

Don't create variables needlessly. Use a dict or list instead, e.g. via a dictionary or list comprehension.

Alternatively, consider MultiIndex columns and a single pd.pivot_table call:

dfs = {2011: df_2011, 2012: df_2012, 2013: df_2013}
comb = pd.concat([v.assign(year=k) for k, v in dfs.items()], ignore_index=True)

df = pd.pivot_table(comb, index='income', columns=['year', 'area'],
                    values='id', aggfunc='count')

Then you can use regular indexing methods to filter for a particular year, e.g.

pivot_2011 = df.iloc[:, df.columns.get_level_values(0).eq(2011)]

How can I create a multiple new dataframes inside a for loop?

3 Answers3