-1

hi:) i am trying to make a for loop to reduce redundancy in my code, where i need to access a number of different sheets within an excel file, count the number of specific values and later plot a graph. my code for my for loop looks like this at the moment:

df = pd.read_excel('C:/Users/julia/OneDrive/Documents/python assignment/2016 data -EU values.xlsx', 
skiprows=6)
sheets_1 = ["PM10 ", "PM2.5", "O3 ", "NO2 ", "BaP", "SO2"]
resultM1 = 0
for sheet in sheets_1:
    print(sheet[0:5])
    for row in df.iterrows():
        if row[1]['Country'] == 'Malta':
            resultM1 += row[1]['AirPollutionLevel']
print(resultM1)

i would like for the output to look something like this: PM10 142 PM2.5 53 O3 21 NO2 3 BaP 21 SO2 32

but what i'm getting is just the sheet names printed after each other and the total amount of the sepcific value i need across all sheets. i.e. PM10 PM2.5 O3 NO2 BaP SO2 284.913786

i really need the values separated into their respective sheet and not added together.

attached is a screeshot of the excel file. as u can see, there are different sheets and many values within -i need to add values for a specific country in each sheet. enter image description here

any help would be greatly appreciated!

1 Answers1

0
import pandas as pd
# Open as Excel object
xls = pd.ExcelFile('C:/Users/julia/OneDrive/Documents/python assignment/2016 data -EU values.xlsx')
# Get sheet names
sheets_1 = xls.sheet_names

# Dictionary of SheetNames:dfOfSheets
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)

# create list to store results
resultM1 = []
# Loop over keys and df in dictionary
for key, df in sheet_to_df_map.items():
    # remove top 5 blank rows
    df = df = df.iloc[5:]
    # set column names as first row values
    headers = df.iloc[0]
    df  = pd.DataFrame(df.values[1:], columns=headers)
    #loop over rows in the df and create pd series to store Malta results
    results  =df.loc[df['Country'] == "Malta", 'AirPollutionLevel']
    # Loop over the results for Malta from each sheet and append 'Malta' and then append the value to a list
    for i in results:
        resultM1.append(key)
        resultM1.append(i)

# Convert list to df
df = pd.DataFrame(resultM1) 
# Rename column
df = df.rename({0: 'Sheet'}, axis=1)  
# create two columns
final = pd.DataFrame({'Sheet':df['Sheet'].iloc[::2].values, 'Value':df['Sheet'].iloc[1::2].values})
Ciaran O Brien
  • 374
  • 3
  • 13
  • thanks @Ciaran! that should have worked but it printed this instead: {'PM10 ': 32.0, 'PM2.5': 32.0, 'O3 ': 32.0, 'NO2 ': 32.0, 'BaP': 32.0, 'SO2': 32.0} and there's no way that each sheet has the same number of values, i think it may have counted the number of times the country was mentioned instead of counting the values associated with that country? – Julia Micallef Filletti Feb 12 '21 at 18:45
  • Is this data available online and I can take a look? @JuliaMicallefFilletti – Ciaran O Brien Feb 12 '21 at 18:49
  • my teacher provided us with this data, not sure where he got it from as i couldnt find it anywhere. its from World Health Organisation. i checked their site to see where he could've gotten it from, but cant find anything :( – Julia Micallef Filletti Feb 12 '21 at 18:52
  • @JuliaMicallefFilletti Try it now. The problem is your line `for row in df.iterrows():` because your are looking at row in original df, not row in sheet. – Ciaran O Brien Feb 12 '21 at 19:47