1

I have an Excel file with several sheets. I would like to iterate over each sheet of the Excel file and create a DataFrame for each sheet in the file. Also, I would like the DataFrames to be named "hfi_(name of the sheet)". This is the for loop I wrote so far, but it is not running due to a syntax error. Would you help me solve this?

for sheet in hfi_file:    
  hfi_file=pd.read_excel("/Users/glosophy/Desktop/hfi/hfi_2016.xlsx",sheet_name=sheet)
return str("hfi"+sheet)=pd.DataFrame(hfi_file)
Guillermina
  • 3,127
  • 3
  • 15
  • 24

5 Answers5

1

You need to import the Excel file first, then read each sheet by name

multi_sheet_file = pd.ExcelFile("/Users/glosophy/Desktop/hfi/hfi_2016.xlsx")

# Gets the sheet names in the file
excel_sheet_names = multi_sheet_file.sheet_names

dict_of_sheets = {}
for sheet in excel_sheet_names:
    dict_of_sheets["hfi" + sheet] = pd.read_excel(multi_sheet_file, sheetname=sheet')
ZaxR
  • 4,896
  • 4
  • 23
  • 42
  • It worked! I'm wondering how to turn the dictionaries into DataFrames now... keys are the name of the DataFrames and the values the data. Thanks! – Guillermina Aug 31 '18 at 03:13
  • 1
    The idea would be to access the dfs from the dictionary by name as they’re needed, but if you mean that you want to create regular variables for each one, see: https://stackoverflow.com/questions/18090672/convert-dictionary-entries-into-variables-python – ZaxR Aug 31 '18 at 03:15
0

I will recommend store those dataframes into a dict

d={"hfi" + sheet:pd.DataFrame(pd.read_excel("/Users/glosophy/Desktop/hfi/hfi_2016.xlsx", sheet_name=sheet)) for sheet in hfi_file}

Then you can using .get to access the data

d.get('your dict key')
BENY
  • 317,841
  • 20
  • 164
  • 234
0

I'm clearly slow on the draw, but here's another answer:

import pandas as pd

# importing these so sessionInfo comes up for them
import openpyxl
import xlrd

file = 'output.xlsx'
df1 = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df2 = pd.DataFrame({'sheet2-col1': [0, 1], 'sheet2-col2': [30, 40]})

writer = pd.ExcelWriter(file)
df1.to_excel(writer, 'Sheet1')
df2.to_excel(writer, 'Sheet2')
writer.save()

reader = pd.ExcelFile(file)
dict = {"hfi" + i : pd.read_excel(file, sheet_name=i) for i in reader.sheet_names}
print(dict)
#> {'hfiSheet1':    col1  col2
#> 0     1     3
#> 1     2     4, 'hfiSheet2':    sheet2-col1  sheet2-col2
#> 0            0           30
#> 1            1           40}

Created on 2018-08-30 by the reprexpy package

import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Darwin-17.7.0-x86_64-i386-64bit (64-bit)
#> Python: 3.6
#> Date: 2018-08-30
#> Packages ------------------------------------------------------------------------
#> openpyxl==2.5.6
#> pandas==0.23.4
#> reprexpy==0.1.1
#> xlrd==1.1.0
Chris
  • 1,575
  • 13
  • 20
0

Here is the easiest way I can share with you:

# read the sheet file
import pandas as pd
sheets = pd.ExcelFile('sheet_filename.xlsx')

# save each sheet to a dataframe
df1 = pd.read_excel(sheets, 'Sheet1')
df2 = pd.read_excel(sheets, 'Sheet2')

Hope this helped you!

Jade Cacho
  • 691
  • 6
  • 9
0
# Read the excel file
filename = 'yourpath/yourfile.xlsx'

df_raw = pd.read_excel(filename)

# iterate through the sheets in the file
for sheet in df_raw.sheet_names:
    sheet = pd.read_excel(filename, sheet_name=sheet)