2

I have a folder with a number of files. I want to import the sheet 'sheet1' as a pandas dataframe for each of them, assigned to the name of a portion of that title.

I've successfully gotten a list of filenames:

path = "/Users/path"
files = os.listdir(path)
files_xls = [f for f in files if f[-3:] == 'xls']

['A.xls', 'B.xls']

And I've successfully made a list of names I'd like as the dataframe names:

names = map(lambda each:each.strip(".xls"), files_xls)

['A', 'B']

But I'm failing at using these names to import. I can do it manually:

A = pd.read_excel(A.xls, 'sheet1')
B = pd.read_excel(B.xls, 'sheet1')
etc...

But I can't figure out how to automate this process.

user1318135
  • 717
  • 2
  • 12
  • 36
  • I suggest you just use a dict to do this, define the keys as your string names and the values as the dfs – EdChum Jun 08 '16 at 14:33

2 Answers2

1

Use .zip to combine your two lists into a dict, iterate over the result using .items() and collect pd.read_excel() in a dict:

df = {}
for name, f in dict(zip(names, files_xls)).items():
    df[name] = pd.read_excel(f, 'sheet1')

df['A'] will then contain the result of pd.read_excel('A.xls').

You could simplify creating the list of names using:

names = [f[:-4] for f in files_xls]

This works as follows:

files_xls = ['A.xls', 'B.xls']

names = [f[:-4] for f in files_xls]
['A', 'B']

name_dict = dict(zip(names, files_xls))
{'A': 'A.xls', 'B': 'B.xls'}

for name, f in name_dict.items():
    print(name, f)

prints:

A A.xls
B B.xls
Stefan
  • 41,759
  • 13
  • 76
  • 81
  • I'm not quite sure what this is doing / where it isn't accomplishing what I'm hoping for, but the error I'm getting when I then just run 'B': NameError: name 'B' is not defined – user1318135 Jun 08 '16 at 15:27
  • So the goal here is to have 'A' yield the dataframe created by A.xls – user1318135 Jun 08 '16 at 15:41
  • 'A' is the dictionary key. `df['A']` holds the `DataFrame`. – Stefan Jun 08 '16 at 15:43
  • Indeed. Thanks. Is there a way to make it so just 'A' holds the DataFrame? That's been my real goal. – user1318135 Jun 08 '16 at 15:44
  • Can't assign to `string` literals. Try `'A' = 3` in a python shell. Dictionaries are quite common practice for a case like this. There's the `exec` command though, take a look here down the thread here: http://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-in-python-via-a-while-loop – Stefan Jun 08 '16 at 15:49
  • Following the 'exec' path you recommended, I eventually came upon, http://stackoverflow.com/a/4357876/1318135, which is what I'm using successfully, together with your answer. – user1318135 Jun 08 '16 at 16:22
  • glad it was helpful. – Stefan Jun 08 '16 at 16:43
-1

There are two options that suppose to work for you:

1.

for name in names:
    xlsx[name] = pd.ExcelFile('PATH\' + name + '.xlsx')
    sheets[name] = xlsx.parse(0)

2.

for name in names:    
    xlsx[name] = pd.read_excel(open('PATH\' + name + '.xlsx','rb'), sheetname='Sheet1')

Note that xlsx and sheets are dictionaries.

Gal Dreiman
  • 3,969
  • 2
  • 21
  • 40