0

I want to read multiple files into separate dataframe objects: i.e., read 'train.csv' into an object called train and the same for 'test.csv'.

The advice I've found so far (e.g., here tend to read the different files into a dictionary or list. From there, I suppose I could generate e.g., train = dict[0] but is there a way to do it directly?

For now, I ran the following but would get an error:

files = ['train.csv', 'test.csv']
names = [train, test]

for file, name in zip(files, names):
    name = pd.read_csv(file)

Output/Error message: NameError: name 'train' is not defined

Marsha T
  • 13
  • 3

2 Answers2

1

You can't assign a name this way reliably, this would mean messing with globals/locals, which is bad practice.

For just two files, the most explicit is:

train = pd.read_csv('train.csv')
test  = pd.read_csv('test.csv')

If you need the loop, use a dictionary:

files = ['train.csv', 'test.csv']
names = ['train', 'test']

dfs = {}
for file, name in zip(files, names):
    dfs[name] = pd.read_csv(file)

For reference, see How do I create variable variables?.

Or use the file name as key:

files = ['train.csv', 'test.csv']

dfs = {f: pd.read_csv(f) for f in files}

# or, without the extension
dfs = {f.removesuffix('.csv'): pd.read_csv(f) for f in files}
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Have you tried using exec? You can create your program line as a string and pass it to exec.

files = ['train.csv', 'test.csv']

for f in files:
    print(f"{f.removesuffix('.csv')} = pd.read_csv('{f}')")

This code generate the string outputs:

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

Now these strings can be passed to exec func as:

for f in files:
    exec(f"{f.removesuffix('.csv')} = pd.read_csv('{f}')")