1

I'm looking for a way to concat numbers stored in a variable and using it as variable names in Python (or interpolated as a string). Something like this:

for NUMBER in range(2000, 2020, 1):
    df_NUMBER = pd.read_csv('data_NUMBER.csv')
    df_NUMBER['year'] = NUMBER 

is it possible?

In stata it would be like:

forvalues NUMBER = 2000(1)2019 {
    import delimited using data_`NUMBER'.csv
    generate year = `NUMBER'
}
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
diwenzo
  • 13
  • 2
  • 2
    Technically it can be done (in a very convoluted way), but what's wrong with just using a dictionary?! – Błotosmętek Aug 21 '20 at 21:01
  • Are you asking how to dynamically generate variable names? This is a very bad idea. Can you show your intended output and also explain why you want to do it? Or are you asking how to create a string with a changing number in it in order to reference different file names? If that latter, it is trivial with string interpolation. – Dan Aug 21 '20 at 21:02
  • 1
    Does this answer your question? [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – AMC Aug 21 '20 at 21:09
  • 3
    Don't dynamically create variables. You can only really do it in global scopes, not local scopes, and it is **almost certainly** a poor design decision. – juanpa.arrivillaga Aug 21 '20 at 21:09
  • poor design but if you want to do this and learn the hard way, I think that falls on you! here is what you'd put inside your loop: `exec("df_%d = pd.read_csv(\'data_%d.csv\')" % (NUMBER, NUMBER))` – d_kennetz Aug 21 '20 at 21:17

2 Answers2

1

Don't try create variable names dynamically, it is a very bad idea. Rather just store the dataframes in a collection. In this case a dictionary might make sense. Otherwise just a single longform dataframe which will make analysis easier:

option 1:

dfs = {}
for year in range(2000, 2020):
    dfs[year] = pd.read_csv(f"data_{year}.csv")
    dfs[year]["year"] = year

option 2:

dfs = []
for year in range(2000, 2020):
    df = pd.read_csv(f"data_{year}.csv")
    df["year"] = year
    dfs.append(df)
df_all = pd.concat(dfs).reset_index(drop=True)

You probably want option 2 in this case as you can then go on to perform your analyses on all the years at once using .groupby("year").

Dan
  • 45,079
  • 17
  • 88
  • 157
0

I suggest going the dictionary way:

df = {}
for year in range(2000,2020):
    filename =  'data_{}.csv'.format(year)
    df[year] = pd.read_csv(filename)
    df[year]['year'] = year
Błotosmętek
  • 12,717
  • 19
  • 29