String comparison always false when using iloc on a pd.dataframe of type 'string'

Question

I've had issues with my project because I'm getting unexpected behaviors when I try to compare two strings, one from a pandas dataframe and one from code. I loaded my pandas Dataframe with columns: ['Country','Region','City','Population','Covid Cases'] to find an eventual correlation between the last two variables.

df = pd.DataFrame(columns = ['Country','Region','City','Population','Cases'])

I wanted to save all populations of a given area (e.g. Southern Italy) in a list to plot it, so I did this, using list comprehension:

pop_sud = [int(df.iloc[i][3]) for i in range(len(df.index)) if str(df.iloc[i][0])=='Italy' 
if str(df.iloc[i][1])=='Sicilia']

The result is that the second 'if' statement appears to be false always, giving me an empty list, which is not the case as shown in a small debug I made while printing all elements of the Region column with the word 'Sicilia':

 Region type: <class 'str'>
 ---
 Puglia Sicilia
 Lombardia Sicilia
 Emilia Sicilia
 Sicilia Sicilia <--
 Toscana Sicilia
 Veneto Sicilia
 Veneto Sicilia

I also tried this version but still gives me an empty list because the if check is not passed:

cases_sud = [int(df.iloc[i][4]) for i in range(len(df.index)) if df.iloc[i][0] == 'Italy' 
if df.loc[i][1] in ['Sicilia','Puglia','Campania']]

I also tried concatenating the if statements with the keyword and obtaining the same result. Why does this happen?

Update:
Thank you all for your answers. By reading WGP's answer I found out that my dataset had a space before all region names, therefore not even reading the word! I also tried Gergely's method and it worked despite the space in the dataset. Thank you all! :)

Have a look at the top answer to this question: https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values, I think it will be helpful to you. — Chris, Apr 07 '20 at 08:23
I don't see at first glance what causes the error, but this is not a good way to select from a dataframe. You could try `df[(df.Country == 'Italy') & (df.Region == 'Sicilia'),'Cases']` and avoid a loop. For debugging, I recommend typing `df.Region == 'Sicilia'` and seeing what comes out. It should be a boolean series with exactly one True and the rest False. — Christoph Burschka, Apr 07 '20 at 08:57

score 1 · Answer 1 · answered Apr 07 '20 at 09:03

I don't know if this is your issue or not, as I'm not sure exactly what your dataframe looks like as i only have the columns from the code you have given me. But it looks like your Region is never just Sicilia It seems to have a word preceding it, in which case your second if statement will always return false.

I think you want to change it to something along the lines of

pop_sud = [
    int(df.iloc[i][3]) 
    for i in range(len(df.index)) 
    if str(df.iloc[i][0])=='Italy'
    if df['Region'].str.contains('Sicilia')[i]
]

You could also do this without the list comprehension with code looking like

pop_sud = df.query(
    "Country == 'Italy' & 
    Region.str.contains('Sicilia')"
)['Cases'].astype(int).tolist()

score 1 · Answer 2 · answered Apr 07 '20 at 09:05

Try filter by "boolean indexing":

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing

This article explains it in detail with great examples:

https://appdividend.com/2019/01/25/pandas-boolean-indexing-example-python-tutorial/

So, if you have this dataset:

nested_lists = [
    ['Country1', 'Region1', 'City1', 1, 5], 
    ['Country1', 'Region1', 'City2', 7, 8], 
    ['Country1', 'Region2', 'City3', 3, 4], 
    ['Country2', 'Region2', 'City4', 6, 8]
] 

df = pandas.DataFrame(nested_lists, columns = ['Country', 'Region', 'City', 'Population', 'Cases'])

You can filter it by Country and Region this way:

df_filtered = df[(df['Country'] == 'Country1') & (df['Region'] == 'Region1')]

Results:

Country     Region  City    Population  Cases
Country1    Region1 City1   1           5
Country1    Region1 City2   7           8

To get only the cases column:

df_filtered2 = df[(df['Country'] == 'Country1') & (df['Region'] == 'Region1')][['Cases']]

Results:

Cases
5
8

String comparison always false when using iloc on a pd.dataframe of type 'string'

2 Answers2