1

I am new to coding and currently i want to create individual dataframes from each excel tab. It works out so far by doing a search in this forum (i found a sample using dictionary), but then i need one more step which i can't figure out.

This is the code i am using:

import pandas as pd

excel = 'sample.xlsx'

xls = pd.ExcelFile(excel)
d = {}
for sheet in xls.sheet_names:
    print(sheet)
    d[f'{sheet}'] = pd.read_excel(xls, sheet_name=sheet)

Let's say i have 3 excel tabs called 'alpha', 'beta' and 'charlie'. the code above will gave me 3 dataframes and i can call them by typing: d['alpha'], d['beta'] and d['charlie']. What i want is to rename the dataframes so instead of calling them by typing (for example) d['alpha'], i just need to write alpha (without any other extras).

Edit: The excel i want to parse has 50+ tabs and it can grow Edit 2: Thank you all for the links and the answers! it is a great help

E_Ri
  • 51
  • 6
  • Related: [How do I create variable variables?](/q/1373164/4518341) – wjandrea Jul 05 '22 at 16:07
  • Just to reinforce what @wjandrea said: the advice in that thread is to use a dictionary, which you already have. It's probably the best idea, but see my answer for another approach that might work, depending on your data. – Matt Hall Jul 05 '22 at 16:30
  • Why do you want to rename them exactly? If you just want to save yourself some typing, you could simply assign names for the dicts you're working on as needed, [like luke suggested](/a/72872459/4518341). Like, if you're about to do some work on "alpha", do `alpha = d['alpha']` and continue. Think of it like an alias or nickname. – wjandrea Jul 05 '22 at 16:33
  • I'd recommend against setting up anything that *automatically* assigns the names, because [mutable global state can make your program unpredictable](//softwareengineering.stackexchange.com/questions/148108/why-is-global-state-so-evil), though if this is only for interactive use, it might not be so bad, i.e. if you accidentally overwrite or shadow another variable, you could fix it. – wjandrea Jul 05 '22 at 16:33

3 Answers3

1

You need to create variables which correspond to the three dataframes:

alpha, beta, charlie = d.values()

Edit:

Since you mentioned that the excel sheet could have 50+ tabs and could grow, you may prefer to do it your original loop. This can be done dynamically using exec

import pandas as pd

excel = 'sample.xlsx'

xls = pd.ExcelFile(excel)
d = {}
for sheet in xls.sheet_names:
    print(sheet)
    exec(f'{sheet}' + " = pd.read_excel(xls, sheet_name=sheet)")

It might be better practice, however, to simply index your sheets and access them by index. A 50+ length collection of excel sheets is probably better organized by appending to a list and accessing by index:

d = []
for sheet in xls.sheet_names:
    print(sheet)
    d.append(pd.read_excel(xls, sheet_name=sheet))

#d[0] = alpha; d[1] = beta, and so on...
Yuca
  • 6,010
  • 3
  • 22
  • 42
luke
  • 465
  • 1
  • 14
  • i know i can do this, but i have more than 50 excel tabs and it can grow....that is why i want to do it by using loop if possible – E_Ri Jul 05 '22 at 16:12
  • 1
    @E_Ri Please add that to the question. Those are important details. – wjandrea Jul 05 '22 at 16:15
  • This was a reasonable answer until `eval` came along. – Matt Hall Jul 05 '22 at 16:29
  • 1
    @kwinkunks this is why I said that it might be better practice to index the sheets. As you stated in your answer, simply using the dictionary he had (or a list) is a better option. As the question is posed, is there a way to assign these dataframes to variables (like OP asked) without using `eval()` or `exec()`? – luke Jul 05 '22 at 16:34
  • 3
    `eval()` doesn't work here since it only does expressions, not statements. You want `exec()`. – wjandrea Jul 05 '22 at 16:39
1

I think you are looking for the build-in exec method, which executes strings. But I do not recommend using exec, it is really widely discussed why it shouldn't be used or at least should be used cautiously.

As I do not have your data, I think it is achievable using the following code:

import pandas as pd

excel='sample.xlsx'
xls=pd.ExcelFile(excel)

for sheet in xls.sheet_names:
 print(sheet)
 code_to_execute = f'{sheet} = pd.read_excel(xls,sheet_name={sheet})'
 exec(code_to_execute)

But again, I highlight that it is not the cleanest way to do that. Your approach is definitely cleaner, to be more precise, I would always use dicts for those kinds of assignments. See here for more about exec.

In general, you want to generate a string.

possible_string = 'a=10'
exec(possible_string)
print(a) # 10
ko3
  • 1,757
  • 5
  • 13
1

Don't rename them.

I can think of two scenarios here:

1. The sheets are fundamentally different

When people ask how to dynamically assign to variable names, the usual (and best) answer is "Use a dictionary". Here's one example.

Indeed, this is the reason Pandas does it this way!

In this case, my opinion is that your best move here is to do nothing, and just use the dictionary you have.

2. The sheets are roughly the same

If the sheets are all basically the same, and only differ by one attribute (e.g. they represent monthly sales and the names of the sheets are 'May', 'June', etc), then your best move is to merge them somehow, adding a column to reflect the sheet name (month, in my example).

Whatever you do, don't use exec or eval, no matter what anyone tells you. They are not options for beginner programmers.

Matt Hall
  • 7,614
  • 1
  • 23
  • 36