0

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.

I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.

The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe

for sheet_name in Matrix.sheet_names:
 sheet_name = pd.read_excel(Matrix, sheet_name)
 print(sheet_name.info())
Oscar
  • 1
  • Possible duplicate of [Import multiple excel files into python pandas and concatenate them into one dataframe](https://stackoverflow.com/questions/20908018/import-multiple-excel-files-into-python-pandas-and-concatenate-them-into-one-dat) – realr Jul 29 '19 at 16:40

3 Answers3

2

Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.

To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.

Try this:

all_my_sheets = []
for sheet_name in Matrix.sheet_names:
    sheet_name = pd.read_excel(Matrix, sheet_name)
    all_my_sheets.append(sheet_name)

Or, even better, using list comprehension:

all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]

You can then concatenate them into one DataFrame like this:

final_df = pd.concat(all_my_sheets, sort=False)
iuvbio
  • 600
  • 6
  • 23
0

You might consider using the openpyxl package:

from openpyxl import load_workbook
import pandas as pd

wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets  = wb.sheetnames

# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
  records = []
  for row in ws._cells_by_row(min_col=1,
                              min_row=n,
                              max_col=ws.max_column,
                              max_row=n):
  rec = [cell.value for cell in row]
  records.append(rec)

  # Make sure you don't duplicate the header
  n = 2

# ------------------------------

# Set the column names
records = records[header_row-1:]
header = records.pop(0)

# Create your df
df = pd.DataFrame(records, columns=header)
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
0

It may be easiest to call read_excel() once, and save the contents into a list.

So, the first step would look like this: dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])

Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:

final_df = pd.concat(dfs, axis=1)

Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.

I hope this helps!

darunia
  • 11
  • 4