1

I have an excel file which contains more than 30 sheets. However the operation that I do on each sheet remains the same more or less. But my objective is to create a separate dataframe for each sheet, so that I can refer in the future

This is what I tried but it throws an error

xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = xls.sheet_names
for s in sheets:
    print(s)
    'df '+ s = pd.read_excel(xls, sheet_name=s)

So, it's like I want 30 dataframes to be created and each dataframe will have the sheet name as the suffix name. I tried using the "+" operator but it didn't help either. It threw an error message as shown below

  SyntaxError: can't assign to operator

How can I create dataframes on the fly and name them ?

The Great
  • 7,215
  • 7
  • 40
  • 128
  • 1
    You can create a dictionary with key as `df_{}.format(s)` and val as `pd.read_excel(xls, sheet_name=s)` – Abdul Niyas P M Jul 24 '19 at 06:37
  • Possible duplicate of [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – Mike Scotty Jul 24 '19 at 06:53

4 Answers4

5

You could use something like this:

for s in sheets:
    vars()['df'+ s] = pd.read_excel(xls, sheet_name=s)
Joe
  • 12,057
  • 5
  • 39
  • 55
  • What will be the dataframe name here?. Does it start with `df` followed by sheetname or just the sheetname? Because, when I follow this solution. I don't get the dataframe name? – The Great Oct 23 '19 at 04:56
  • The name will start with `df` followed by the sheetname. For example if the sheetname are `Sheet1` and `Sheet2`, the name of the dfs will be: `dfSheet1` and `dfSheet2` (it is the name in the square brackets: `df`+s) – Joe Oct 23 '19 at 05:58
2

Strictly speaking not an answer to your question but this will create a dictionary where the key is the sheet name and the value is the dataframe.

workbook = pd.read_excel('DC_Measurement.xlsx', sheet_name = None)

Then you can retrieve the dataframe you need like this.

df = workbook['sheet_name']

I think this is tidier than other solutions.

MarLeo
  • 2,566
  • 2
  • 18
  • 16
1

Or use locals:

for s in sheets:
    locals()['df'+ s] = pd.read_excel(xls, sheet_name=s)

In a function change locals to globals.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

The best approach is usually to store the dataframes in a list or dictionary, where you can work with them systematically, like this:

xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = {}
for s in xls.sheet_names:
    print(s)
    sheets[s] = pd.read_excel(xls, sheet_name=s)

Or just this:

xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = {
    s: pd.read_excel(xls, sheet_name=s)
    for s in xls.sheet_names
}

This will make it easy to work with the sheets programmatically later (just access sheets[s], where s is a sheet name). Otherwise you will next face the tricky problem of how to access all the dataframes that you've just created as free-floating variables.

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45