0

I'm needing to read in an excel file and read all sheets inside that excel file.

I've tried:

sample_df = pd.concat(pd.read_excel("sample_master.xlsx", sheet_name=None), ignore_index=True)

This code worked, but it's suddenly giving me this error:

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

After reading in the excel file, I need to run the following command:

new_id = sample_df.loc[(sample_df['Sequencing_ID'] == line) & (sample_df['Experiment_ID'] == experiment_id), \
                               'Sample_ID_for_report'].item()

Any help?

Kyle
  • 23
  • 4
  • 2
    Maybe the excel workbook you are reading now that gives this error has only one sheet, therefore the return from pd.read_excel is a single DataFrame and not dictionary of DataFrames. – Scott Boston May 12 '21 at 19:25
  • 1
    The return is either a DataFrame or dict of DataFrames so if you want something more flexible do the `read_excel` then check the type and if it's a dict concat, else you're all set. – ALollz May 12 '21 at 19:33
  • There are still multiple sheets in the excel file. – Kyle May 12 '21 at 19:41
  • Let's make sure. `type(pd.read_excel("sample_master.xlsx", sheet_name=None))` What is the output of that? – Scott Boston May 12 '21 at 19:47
  • Does this answer your question? [Using Pandas to pd.read\_excel() for multiple worksheets of the same workbook](https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook) – Andy May 12 '21 at 19:50
  • pd.read_excel("sample_master.xlsx", sheet_name=None) This will return the concatenated Dataframe you don't need to use pd.concat() I guess. @Kyle – Nk03 May 12 '21 at 19:57

1 Answers1

0

First, you will want to know all of the sheets that need to be read in. Second, you will want to iterate over each sheet.

  1. Getting Sheet names.- You can get a list of the sheet names in a workbook with sheets = pd.ExcelFile(path).sheet_names, where path is the full path to your file. The function below reads a workbook and returns a list of sheet names that contain specific key words.
      import re
      import pandas as pd

      def get_sheets(path):
     
        sheets = pd.ExcelFile(path).sheet_names
        sheets_to_process = []
        for sheet in sheets:
          excludes = ['exclude_term1', 'exclude_term1']
          includes = ['find_term1', 'find_term2']
          sheet_stnd = re.sub('[^0-9A-Za-z_]+', '', sheet).lower().strip(' ') 
          for exclude in excludes:
            if sheet_stnd != exclude:
                for include in includes:
                    if include in sheet_stnd:
                        sheets_to_process.append(sheet)
         return list(set(sheets_to_process))
  1. Loop over sheets- You can then loop over the sheets to read them in. In this example,
for sheet in get_sheets(path):
  df = pd.concat(pd.read_excel("sample_master.xlsx", sheet_name=sheet), 
                      ignore_index=True)

Depending on your use case, you may also want to append each sheet into a larger data frame

Oxford_orange
  • 157
  • 1
  • 10