I've had issues with my project because I'm getting unexpected behaviors when I try to compare two strings, one from a pandas dataframe and one from code. I loaded my pandas Dataframe with columns: ['Country','Region','City','Population','Covid Cases'] to find an eventual correlation between the last two variables.
df = pd.DataFrame(columns = ['Country','Region','City','Population','Cases'])
I wanted to save all populations of a given area (e.g. Southern Italy) in a list to plot it, so I did this, using list comprehension:
pop_sud = [int(df.iloc[i][3]) for i in range(len(df.index)) if str(df.iloc[i][0])=='Italy'
if str(df.iloc[i][1])=='Sicilia']
The result is that the second 'if' statement appears to be false always, giving me an empty list, which is not the case as shown in a small debug I made while printing all elements of the Region column with the word 'Sicilia':
Region type: <class 'str'>
---
Puglia Sicilia
Lombardia Sicilia
Emilia Sicilia
Sicilia Sicilia <--
Toscana Sicilia
Veneto Sicilia
Veneto Sicilia
I also tried this version but still gives me an empty list because the if check is not passed:
cases_sud = [int(df.iloc[i][4]) for i in range(len(df.index)) if df.iloc[i][0] == 'Italy'
if df.loc[i][1] in ['Sicilia','Puglia','Campania']]
I also tried concatenating the if statements with the keyword and
obtaining the same result.
Why does this happen?
Update:
Thank you all for your answers. By reading WGP's answer I found out that my dataset had a space before all region names, therefore not even reading the word! I also tried Gergely's method and it worked despite the space in the dataset. Thank you all! :)