1

I want to loop through a directory and find specific xlsx files and then put them each into separate pandas dataframe. The thing here is that I also want all sheets in those excel files to be in the dataframe.

Below is a sample of code that I implemented, I just need to add the logic to pick all sheets:

import pandas as pd
from glob import glob

path = 'path_to_file'

files = glob(path + '/*file*.xlsx')

get_df = lambda f: pd.read_excel(f)

dodf = {f: get_df(f) for f in files}

dodf[files[2]] --- dictionary of dataframes 
Manas Jani
  • 699
  • 2
  • 11
  • 33
  • I have already created a dictionary to access each dataframe. So the current method should be fine, i.e. in a dictionary. – Manas Jani Sep 13 '17 at 19:44

1 Answers1

1

As described in this answer in Pandas you still have access to the ExcelFile class, which loads the file creating an object.

This object has a .sheet_names property which gives you a list of sheet names in the current file.

xl = pd.ExcelFile('foo.xls')
xl.sheet_names  # list of all sheet names

To actually handle the import of the specific sheet, use .parse(sheet_name) on the object of the imported Excel file:

xl.parse(sheet_name)  # read a specific sheet to DataFrame

For your code something like:

get_df = lambda f: pd.ExcelFile(f)
dodf = {f: get_df(f) for f in files}

...gives you dodf a dictionary of ExcelFile objects.

filename = 'yourfilehere.xlsx'
a_valid_sheet = dodf[filename].sheet_names[0] # First sheet
df = dodf[filename].parse(sheet_name)
mfitzp
  • 15,275
  • 7
  • 50
  • 70
  • I do not want to manually input the filename. Is there a way to get it from the dictionary dodf that I have created? I am completely new to Python so I do not know as such how it all works. – Manas Jani Sep 13 '17 at 20:03
  • yep — but you need you change your `pd.read_excel(f)` to `pd.ExcelFile(f)`. Once that is done, each object will have the `.sheet_names` attribute which is a list of sheets in that file. – mfitzp Sep 13 '17 at 20:07
  • Yes, I already did. But then, I have to individually parse each of the sheet into the dataframe right? – Manas Jani Sep 14 '17 at 14:46
  • @ManasJani that's right. But you can iterate over the list of `sheet_names` to do this, e.g. `for sheet in your_xls_obj.sheetnames: df = your_xls_obj.parse(sheet)` – mfitzp Sep 14 '17 at 17:57