Variable names in Python using numbers

Question

I'm looking for a way to concat numbers stored in a variable and using it as variable names in Python (or interpolated as a string). Something like this:

for NUMBER in range(2000, 2020, 1):
    df_NUMBER = pd.read_csv('data_NUMBER.csv')
    df_NUMBER['year'] = NUMBER

is it possible?

In stata it would be like:

forvalues NUMBER = 2000(1)2019 {
    import delimited using data_`NUMBER'.csv
    generate year = `NUMBER'
}

Technically it can be done (in a very convoluted way), but what's wrong with just using a dictionary?! — Błotosmętek, Aug 21 '20 at 21:01
Are you asking how to dynamically generate variable names? This is a very bad idea. Can you show your intended output and also explain why you want to do it? Or are you asking how to create a string with a changing number in it in order to reference different file names? If that latter, it is trivial with string interpolation. — Dan, Aug 21 '20 at 21:02
Does this answer your question? [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) — AMC, Aug 21 '20 at 21:09
Don't dynamically create variables. You can only really do it in global scopes, not local scopes, and it is **almost certainly** a poor design decision. — juanpa.arrivillaga, Aug 21 '20 at 21:09
poor design but if you want to do this and learn the hard way, I think that falls on you! here is what you'd put inside your loop: `exec("df_%d = pd.read_csv(\'data_%d.csv\')" % (NUMBER, NUMBER))` — d_kennetz, Aug 21 '20 at 21:17

Dan · Accepted Answer · 2020-08-21T21:13:28.687

Don't try create variable names dynamically, it is a very bad idea. Rather just store the dataframes in a collection. In this case a dictionary might make sense. Otherwise just a single longform dataframe which will make analysis easier:

option 1:

dfs = {}
for year in range(2000, 2020):
    dfs[year] = pd.read_csv(f"data_{year}.csv")
    dfs[year]["year"] = year

option 2:

dfs = []
for year in range(2000, 2020):
    df = pd.read_csv(f"data_{year}.csv")
    df["year"] = year
    dfs.append(df)
df_all = pd.concat(dfs).reset_index(drop=True)

You probably want option 2 in this case as you can then go on to perform your analyses on all the years at once using .groupby("year").

Błotosmętek · Answer 2 · 2020-08-21T21:13:47.100

0

I suggest going the dictionary way:

df = {}
for year in range(2000,2020):
    filename =  'data_{}.csv'.format(year)
    df[year] = pd.read_csv(filename)
    df[year]['year'] = year

edited Aug 21 '20 at 21:13

answered Aug 21 '20 at 21:05

Błotosmętek

12,717
19
29

In `range(2000,2020,1)`, you don't need the last parameter because the default step is already 1. – Elan-R Aug 21 '20 at 21:11
Right, I just copy-pasted from the OP's question. – Błotosmętek Aug 21 '20 at 21:13

Variable names in Python using numbers

2 Answers2