0

I want to read and prepare data from a excel spreadsheet containing many sheets with data.

I first read the data from the excel file using pd.read_excel with sheetname=None so that all the sheets can be written into the price_data object.

price_data = pd.read_excel('price_data.xlsx', sheetname=None)

This gives me an OrderedDict object with 5 dataframes. Afterwards I need to obtain the different dataframes which compose the object price_data. I thought of using a for iteration for this, which gives me the opportunity to do other needed iterative operations such as setting the index of the dataframes. This is the approach I tried

for key, df in price_data.items(): 
    df.set_index('DeliveryStart', inplace=True)
    key = df

With this code I would expect that each dataframe would be written into an object named by the key iterator, and at the end I would have as many dataframes as those inside my original data_price object. However I end up with two identical dataframes, one named key and one named value.

Suggestions?

2 Answers2

0

If you are happy to set index of the DataFrames in-place, you could try this:

for key in price_data:
    price_data[key].set_index('DeliveryStart', inplace=True)
sjw
  • 6,213
  • 2
  • 24
  • 39
0

Reason for current behaviour:

In your example, the variables key and df will be created (if not already existing) and overwritten in each iteration of the loop. In each iteration, you are setting key to point towards the object df (which also remains set in df, as Python allows multiple pointers to the same object). However, the key object is then overwritten in the next loop and set to the new value of df. At the end of the loop, the variables will remain in their last state.

To illustrate:

from collections import OrderedDict
od = OrderedDict()
od["first"] = "foo"
od["second"] = "bar"
# I've added an extra layer of `enumerate` just to display the loop progress. 
# This isn't required in your actual code.
for loop, (key, val) in enumerate(od.items()):
    print("Iteration: {}".format(loop))
    print(key, val)
    key = val
    print(key,val)
print("Final output:", key, val)

Output:

Iteration: 0
first foo
foo foo
Iteration: 1
second bar
bar bar
Final output: bar bar

Solution:

It looks like you want to dynamically set the variables to be named the same as the value of key, which isn't considered a good idea (even though it can be done). See Dynamically set local variable for more discussion.

It sounds like a dict, or OrderedDict is actually a good format for you to store the DataFrames alongside the name of the sheet it originated from. Essentially, you have a container with the named attributes you want to use. You can then iterate over the items to do work like concatenation, filtering or similar.

If there's a different reason you wanted the DataFrames to be in standalone objects, leave a comment and I will try and make a follow-up suggestion.

Phil Sheard
  • 2,102
  • 1
  • 17
  • 38
  • 1
    Thanks. I have actually never worked with OrderedDicts so initially I just wanted to have my DataFrames as separate objects to to standard transformations such as concatenate, resample and plot. So there isn't any disadvantage in leaving the DataFrames inside the OrderedDict? All operations will work normally? – Vitor Campos Feb 19 '18 at 11:00
  • Yes @VitorCampos, you'll be able to perform all operations like normal. The only difference between an `OrderedDict` and a regular `dict` is that the order of keys is maintained (rather than a `dict` which will come back randomly if you iterate over them). – Phil Sheard Feb 20 '18 at 10:21
  • Also, if this solves your problem could you mark the answer as complete? – Phil Sheard Feb 20 '18 at 10:23
  • @PhilSheard What is the solution where we, from the Ordered Dict, can turn each key into an individual pandas dataframe, then 2) dynamically set the variables(dataframe names) to be named the same as the value of key? VitorCampos, did you figure out this answer? – BrianBeing Apr 12 '19 at 15:19
  • Would there be any benefit in taking something like this ordered dictionary and merging it all into a single dataframe similar to a third norm form datatable? – Dan Aug 18 '20 at 14:01