1

I need to read in a list of SKUs for three different years (2019, 2020, and 2021), all of which exist in an Excel workbook, each with a worksheet labeled for the year. I want to read in all three worksheets into a single dataframe, and create a column for the appropriate year for which a SKU belongs.

The way I did it is very manual (see below). Is there a more efficient way to code this?

'''
Read in the yearly SKU worksheets
'''
df2019= pd.read_excel('test.xlsx', sheet_name='2019', index_col=None, header=0)
df2019['Year'] = '2019'

df2020= pd.read_excel('test.xlsx', sheet_name='2020', index_col=None, header=0)
df2020['Year'] = '2020'

df2021= pd.read_excel('test.xlsx', sheet_name='2021', index_col=None, header=0)
df2021['Year'] = '2021'

df_all=pd.concat([df2019, df2020, df2021])

df_all.head(9)

My output is as follows: Output

AJCaffarini
  • 87
  • 1
  • 7

1 Answers1

1

You can use pandas.read_excel and make sheet_name=None to return a dict of dataframes when the keys are the sheets names.

Try this :

dict_df_all = pd.read_excel('test.xlsx', sheet_name=None, index_col=None, header=0)

for k, v in dict_df_all.items():
    v['Year'] = k
    
df_all = pd.concat(dict_df_all, ignore_index=True)
Timeless
  • 22,580
  • 4
  • 12
  • 30