How to convert excel file into pandas data frame

Question

I have 6 excel files each containing 1 to 3 spread sheets. I want to upload them into pandas data frame and then do some data cleaning on them. My question is that how I can convert these excel files into one data frame in pandas?

I tried to open one of the excel files that has multiple sheets into data frame in pandas but couldn’t do that?

do all sheets have the same structure ex. column names? – Jason Baker Jan 15 '23 at 01:36 — Jason Baker, Jan 15 '23 at 01:36

Shahriyar Shamsipour · Answer 1 · 2023-01-15T02:30:11.887

You have to use the pd.read_excel. See the answer to your question here:

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

You can also take a look at the original source:

pandas.read_excel

Also I have an excel file, the first sheet has 17,000 rows and the second sheet has 3,000 rows and the columns of both sheets are the same. The name of the first sheet is "1" and the name of the second sheet is "2". Here I showed how to read each separately and together.

import pandas as pd

df1 = pd.read_excel('file.xlsx',sheet_name = '1')
df2 = pd.read_excel('file.xlsx',sheet_name = '2')

df3 = pd.read_excel('file.xlsx',sheet_name = ['1','2'])
df4 = pd.concat(df3, axis=0, ignore_index=True)

print('df1:',len(df1))
print('df2:',len(df2))
print('df3:',len(df3))
print('df4:',len(df4))

Output:

df1: 17000
df2: 3000
df3: 2
df4: 20000

score 0 · Answer 2 · answered Jan 15 '23 at 06:22

0

Using sheet_name=None argument returns a dictionary, which can be iterated using a dictionary comprehension to create a mapping of file names as keys and dataframes as values. Finally run a list comprehension on the mapping values and concat all dfs together.

from pathlib import Path

import pandas as pd


def concat_all_sheets_in_all_files(path: str) -> pd.DataFrame:
    files = list(Path(path).rglob("*.xlsx"))
    dfs_mapping = {Path(file).stem: pd.concat(pd.read_excel(file, sheet_name=None)) for file in files}

    return pd.concat([df for _, df in dfs_mapping.items()]).reset_index(drop=True)


file_path = "path/to/all/excel/files/"

final_df = concat_all_sheets_in_all_files(path=file_path)
print(final_df)

answered Jan 15 '23 at 06:22

Jason Baker

3,170
2
12
15

Thank you. may I ask one more question please: there are some rows at the header, footer, and maybe middle which are empty and I would like to get rid of them while converting those excel files with multiple worksheets into a data frame. Any idea? – user21010083 Jan 15 '23 at 20:17
add dropna(): exp: return pd.concat([df for _, df in dfs_mapping.items()]).dropna().reset_index(drop=True) – Jason Baker Jan 15 '23 at 20:28
Thanks. The sheets all have the same column names but other than sheet 1s, the others don't show any titles in their columns. How can I resolve that, please? – user21010083 Jan 15 '23 at 20:41
did they have column names before? – Jason Baker Jan 15 '23 at 21:45

How to convert excel file into pandas data frame

2 Answers2