0

I am trying to read in a bunch of csv files from multiple sub-subfolders with the following code:

for csv in glob.glob('./data/*/*.csv', recursive=True): # all csv files in ./data
    vname = 'data_' + csv.split('/')[3].split('.')[0].lower() # variable names created from lowercased filenames
    print(csv, '-->', vname) # test print csv-path and variable (for debugging)
    exec("{0} = {1}".format(vname, pd.read_csv(csv, encoding='latin1'))) # initialize data from csvs to varaible names

I have tried reading the csv in a seperate line and working with a tmp variable as an argument for format with no success. Reading in the csv files per se works and assigning an integer exec("{0} = {1}".format(vname, 2)) works, too. I cannot get my head around why I am always getting the following SyntaxError:

Traceback (most recent call last):

  File "/home/seb/.anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-173-4c89ac367e67>", line 4, in <module>
    exec("{0} = {1}".format(vname, pd.read_csv(csv, encoding='latin1'))) # initialize data from csvs to varaible names

  File "<string>", line 1
    data_sharks_rays_chimaeras =           id_no                 binomial presence origin seasonal  \
                                                                        ^
SyntaxError: invalid syntax

1 Answers1

0

The issue is that you are attempting to use the result of pd.read_csv as a string formatting argument. It won't work.

You could try:

exec("{0} = pd.read_csv({1}, encoding='latin1'))".format(vname, csv)

However, using exec for that kind of task is not recommended (see here for some clues). You could use a dictionary instead:

data = {}
for csv in glob.glob('./data/*/*.csv', recursive=True):
    vname = 'data_' + csv.split('/')[3].split('.')[0].lower()
    data[vname] = pd.read_csv(csv, encoding='latin1'))
Pierre V.
  • 1,625
  • 1
  • 11
  • 14