-2

I have 6 excel files each containing 1 to 3 spread sheets. I want to upload them into pandas data frame and then do some data cleaning on them. My question is that how I can convert these excel files into one data frame in pandas?

I tried to open one of the excel files that has multiple sheets into data frame in pandas but couldn’t do that?

2 Answers2

0

You have to use the pd.read_excel. See the answer to your question here:

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

You can also take a look at the original source:

pandas.read_excel

Also I have an excel file, the first sheet has 17,000 rows and the second sheet has 3,000 rows and the columns of both sheets are the same. The name of the first sheet is "1" and the name of the second sheet is "2". Here I showed how to read each separately and together.

import pandas as pd

df1 = pd.read_excel('file.xlsx',sheet_name = '1')
df2 = pd.read_excel('file.xlsx',sheet_name = '2')

df3 = pd.read_excel('file.xlsx',sheet_name = ['1','2'])
df4 = pd.concat(df3, axis=0, ignore_index=True)

print('df1:',len(df1))
print('df2:',len(df2))
print('df3:',len(df3))
print('df4:',len(df4))

Output:

df1: 17000
df2: 3000
df3: 2
df4: 20000
0

Using sheet_name=None argument returns a dictionary, which can be iterated using a dictionary comprehension to create a mapping of file names as keys and dataframes as values. Finally run a list comprehension on the mapping values and concat all dfs together.

from pathlib import Path

import pandas as pd


def concat_all_sheets_in_all_files(path: str) -> pd.DataFrame:
    files = list(Path(path).rglob("*.xlsx"))
    dfs_mapping = {Path(file).stem: pd.concat(pd.read_excel(file, sheet_name=None)) for file in files}

    return pd.concat([df for _, df in dfs_mapping.items()]).reset_index(drop=True)


file_path = "path/to/all/excel/files/"

final_df = concat_all_sheets_in_all_files(path=file_path)
print(final_df)
Jason Baker
  • 3,170
  • 2
  • 12
  • 15
  • Thank you. may I ask one more question please: there are some rows at the header, footer, and maybe middle which are empty and I would like to get rid of them while converting those excel files with multiple worksheets into a data frame. Any idea? – user21010083 Jan 15 '23 at 20:17
  • add dropna(): exp: return pd.concat([df for _, df in dfs_mapping.items()]).dropna().reset_index(drop=True) – Jason Baker Jan 15 '23 at 20:28
  • Thanks. The sheets all have the same column names but other than sheet 1s, the others don't show any titles in their columns. How can I resolve that, please? – user21010083 Jan 15 '23 at 20:41
  • did they have column names before? – Jason Baker Jan 15 '23 at 21:45