2

I want to create a for_loop that doesn't overwrite the exiting dataframe?

for df in 2011, 2012, 2013:
       df = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

Right now the for loop above iterates over each of the existing dataframes. How can I make it so the for loop creates a bunch of new dataframes?

2011_pivot, 2012_pivot, 2013_pivot
OptimusPrime
  • 619
  • 8
  • 17
  • 1
    So final output would be three dataframes or one dataframe with all the previous dataframes concatenated? – mad_ Oct 01 '18 at 20:04
  • You should use a dict to save the dataframes you are creating, where "2011_pivot", "2012_pivot" and "2013_pivot" are the keys. – brunormoreira Oct 01 '18 at 20:09
  • https://stackoverflow.com/a/52457013/10292170 https://stackoverflow.com/a/52508030/10292170 – ipramusinto Oct 01 '18 at 20:29
  • Did an answer below help? If so, feel free to [accept](https://stackoverflow.com/help/accepted-answer) one, or ask for clarification. – jpp Oct 03 '18 at 22:50

3 Answers3

3

I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'

df_dict = dict()
for df in 2011, 2012, 2013:
   df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

I'm assuming here that your dataframes have the names "2011", "2012", "2013"

Sven Harris
  • 2,884
  • 1
  • 10
  • 20
1

I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.

df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013]

You can find an example here.

Colonder
  • 1,556
  • 3
  • 20
  • 40
0

Don't create variables needlessly. Use a dict or list instead, e.g. via a dictionary or list comprehension.

Alternatively, consider MultiIndex columns and a single pd.pivot_table call:

dfs = {2011: df_2011, 2012: df_2012, 2013: df_2013}
comb = pd.concat([v.assign(year=k) for k, v in dfs.items()], ignore_index=True)

df = pd.pivot_table(comb, index='income', columns=['year', 'area'],
                    values='id', aggfunc='count')

Then you can use regular indexing methods to filter for a particular year, e.g.

pivot_2011 = df.iloc[:, df.columns.get_level_values(0).eq(2011)]
jpp
  • 159,742
  • 34
  • 281
  • 339