1

At the request of many, I have simplified the problem as far as I can imagine (my imagination doesn't go that far), and I think it's reproducable also. The two different Excel files I've been using are called: "first apples.xlsx" and "second apples.xlsx". I've been using the following data:

"first apples.xlsx" "second apples.xlsx"

import os

import numpy as np
import pandas as pd
import glob
#%%
path = os.getcwd() + r"\apples"
file_locations = glob.glob(path + "\*.xlsx")
#%%
df = {}
for i, file in enumerate(file_locations):
    df[i] = pd.read_excel(file, usecols=['Description',  'Price'])
#%%
price_standard_apple = []
price_special_apple = []
special_apple_description = ['Golden', 'Diamond', 'Titanium']
#%%
for file in range(len(df)):
    df_description = pd.DataFrame(df[file].iloc[:,-2])
    df_prices = pd.DataFrame(df[file].iloc[:,-1])
    for description in df_description['Description']:
        if description in special_apple_description:
            description_index = df_description.loc[df_description['Description']==description].index.values
            price = df_prices['Price'].iloc[description_index]
            price_sum = np.sum(price)
            price_special_apple.append(price_sum)
        elif description not in special_apple_description:
            description_index = df_description.loc[df_description['Description']==description].index.values
            price = df_prices['Price'].iloc[description_index]
            price_sum = np.sum(price)
            price_standard_apple.append(price_sum)

I would expect the sum of the red colored cells (the special apples so to speak) to be 97 and that of the standard apples to be 224. This is not the case and the problem seems to be in the second loop. Python prints the following values: standard: 234 special: 209

  • 3
    Please create a [mre], also don't spam tags. – Julien Aug 03 '22 at 07:32
  • 2
    I have the feeling it would be possible to remove the for-loops completely and perform the operations on the full dataframe. Without a reproducible example that's hard to tell for certain, though :-) – André Aug 03 '22 at 07:49
  • @André I'm working on the reprex! Thanks – Masterrun80 Aug 03 '22 at 07:54
  • @André Would it? Because I'm trying to get the sum of the special descriptions in each different Excel file, so I'm able to compare them. – Masterrun80 Aug 03 '22 at 09:05
  • Just a rough outline of what I imagine I would do, but not sure if it applies exactly to your usecase: Read the excel files into a single dataframe, with an additional column that, for each line, specifies from which file it came from. That's the outer loop gone. The `if x in special_apples` you can likely replace with a `df.filter` call. With the inner loop I'm not sure what to do to remove it, since I don't quite understand what is happening in your code. But since you have everything inside one dataframe, it should be possible to get rid of it, too. – André Aug 03 '22 at 09:18
  • Ah that is true... The first line of the second loop returns the value of the index number associated with the description, which is equal to the special description. This value is then being used in the sliced ​​price list to find the corresponding price. – Masterrun80 Aug 03 '22 at 09:49

2 Answers2

1

I think the break statement would be useful in your case. It would allow you to break out of the inner for loop when you've retrieved the value you wanted, continuing the outer for loop.

Also covered in this question: Python: Continuing to next iteration in outer loop

  • 1
    Also, although not directly related to the problem you asked about, I agree with @André above that it seems you could solve this (at least) without the inner loop. Perhaps something to consider. – Chris Ellery Aug 03 '22 at 08:01
1

I think you are making this harder than it needs to be.

Given your test data from above you can simply do:

import pandas as pd

test_data = pd.read_csv(r'path\to\file', sep=',')

special = ['Golden', 'Diamond', 'Titanium']

mask = test_data['Description'].isin(special)
specials_price = sum(test_data[mask]['Price'])  # -> 97
other = sum(test_data[~mask]['Price'])  # -> 224

This is what test_data looks like:

Test file looks like this

code-lukas
  • 1,586
  • 9
  • 19