-1

I'm trying to dynamically create variable names in a for loop. In the contrived example below, I simply want to create a separate dataframe for each ticker:

tickers = ['FB', 'AMZN', 'NFLX', 'GOOG']

for ticker in tickers:
    'df_' + ticker = pd.read_excel('my_data.xlsx', sheet_name=ticker)

#SyntaxError: can't assign to operator

However, this seems to work:

for ticker in tickers:
    locals()['df_' + str(ticker)] = pd.read_excel('my_data.xlsx', sheet_name=ticker)

I've seen similar examples of this question posted previously, but responses range from dicts, to locals, to setattr. I'm trying to learn and understand the most Pythonic way to handle this. Hard for me to understand from others' examples and often responses that suggest a non-optimal way of handling.

ScottP
  • 107
  • 1
  • 2
  • 8
  • 3
    Well, `setattr` doesn't work, `locals()` only works in the global scope, and dicts always work, so... I think we have a clear winner. – Aran-Fey Sep 22 '18 at 10:30
  • @Aran-Fey So I would use something like `d = {}` outside of the for loop, followed by `d['df_' + str(ticker)]` for my variable names in the for loop? Can you help me understand what this is actually doing? Thanks so much! – ScottP Sep 22 '18 at 10:39
  • 1
    Possible duplicate of [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – Patrick Artner Sep 22 '18 at 10:43
  • The dupe should answer your question - plenty of ppl provided answers. – Patrick Artner Sep 22 '18 at 10:44

2 Answers2

5

The most Pythonic is don't. If you find yourself trying to dynamically create variables names, then you have chosen the wrong pattern.

Inside a loop, you should reuse the same variable name, and have it meaningfull.

So either you fully use the dataframe in the loop (my prefered scenario) and should use:

tickers = ['FB', 'AMZN', 'NFLX', 'GOOG']

for ticker in tickers:
    ticker_df = pd.read_excel('my_data.xlsx', sheet_name=ticker)
    # process the dataframe ...

or you really need a dict of dataframes (beware of resources...) and then do:

tickers = ['FB', 'AMZN', 'NFLX', 'GOOG']
ticker_df = {}

for ticker in tickers:
    ticker_df[ticker] = pd.read_excel('my_data.xlsx', sheet_name=ticker)
    ...

# all the dataframes are available here in ticker_df dict...
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks Serge, very helpful! Method 2 works easily for me. But since method 1 above you noted as your preferred scenario, I'd like to better understand it. How is that actually accumulating the data from each worksheet / what's the next most efficient step to do so? Also, that's basically then just creating a single dataframe, correct? – ScottP Sep 22 '18 at 11:13
0

It is usually discouraged to dynamically create variable names. Why don't you write to a dictionary:

tickers = ['FB', 'AMZN', 'NFLX', 'GOOG']
my_data_dic = {}
for ticker in tickers:
    my_data_dic['df_' + str(ticker)] = pd.read_excel('my_data.xlsx', sheet_name=ticker)
onno
  • 969
  • 5
  • 9