Python (Pandas) error 'the label [Algeria] is not in the [index]'

Question

I do not understand why this works

df[(df['Gold']>0) & (df['Gold.1']>0)].loc[((df['Gold'] - df['Gold.1'])/(df['Gold'])).abs().idxmax()]

but when I divide by (df['Gold'] + df['Gold.1'] + df['Gold.2']) it stops working giving me error that you can find below.

Interestingly, the following line works

df.loc[((df['Gold'] - df['Gold.1'])/(df['Gold'] + df['Gold.1'] + df['Gold.2'])).abs().idxmax()]

I do not understand what is happening since I just started to learn Python and Pandas. I need to understand the reason why this happens and how to fix it.

ERROR

KeyError: 'the label [Algeria] is not in the [index]'

DataFrame snap

Try `print(df.index.tolist())`, you might have some spaces in there. — IanS, Jan 02 '17 at 13:55
@MaharajaX: in the future please post a text sample of your dataframe so that we can play with it (or code to produce it), not a picture. See [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for example. Thanks, and good luck with your course ;) — Julien Marrec, Jan 02 '17 at 19:52
The sample dataframe doesn't help much because the Winter medal counts (`Gold.1,Silver.1,Bronze.1,Total.1`)) for all countries are all zero. By the way I would have named those series `Gold.S, Gold.W, Gold` just to be clear. — smci, Feb 22 '18 at 10:25
If you post us reproducible code and a dataset (or URL), we could reply. It's a nice question for practising good idiom on. The cause of your bug is "multiindexing", i.e. `df[...][...]` will result in the LHS expression giving you a copy, which the RHS expression then tries to process/modify, instead of working directly on the source df. `df.filter` might be a better way to go... — smci, Feb 22 '18 at 10:28

score 6 · Accepted Answer · edited Jan 02 '17 at 19:50

6

Your problem is boolean indexing:

df[(df['Gold']>0) & (df['Gold.1']>0)]

returns a filtered DataFrame which does not contain the index of max value of Series you calculated with this:

((df['Gold'] - df['Gold.1'])/(df['Gold'] + df['Gold.1'] + df['Gold.2'])).abs().idxmax()

In your data it is Algeria.

So loc logically throws a KeyError.

One possible solution is to assign the new filtered DataFrame to df1 and then get the index corresponding to the max value of Series by using idxmax:

df1 = df[(df['Gold']>0) & (df['Gold.1']>0)]
df2 = df1.loc[((df1['Gold']-df1['Gold.1'])/(df1['Gold']+df1['Gold.1']+df1['Gold.2'])).abs().idxmax()]

edited Jan 02 '17 at 19:50

Julien Marrec

11,605
4
46
63

answered Jan 02 '17 at 14:00

jezrael

822,522
95
1,334
1,252

I did not really get this "return df which not contains index of max value of Series:" So you are saying max value is not in data frame that is returned after boolean operation? I though we first perform boolean filter, then on what's filtered we find max value. Isn't it how it works? – YohanRoth Jan 02 '17 at 15:48
No, because although you filter it, you dont use filtered values in `((df['Gold'] - df['Gold.1'])/(df['Gold'] + df['Gold.1'] + df['Gold.2'])).abs().idxmax()` but original unfiltered. Btw, this is very hard debugging error, because sometimes it works nice - if filtered dataframe contains idxmax, but sometimes it failed if values are changed. If `Algeria` return `((df['Gold'] - df['Gold.1'])/(df['Gold'] + df['Gold.1'] + df['Gold.2'])).abs().idxmax()`, you can see `Gold.1==0`, so not `(df['Gold.1']>0)` – jezrael Jan 02 '17 at 15:54
hmm, thanks. That is so weird. What's even a point to allow writing like this when it brings so subtle errors and it does not work the way expected. I expected it to be evaluated from left to right. Instead it works so weirdly :( Anyway, thanks! – YohanRoth Jan 02 '17 at 15:58

Python (Pandas) error 'the label [Algeria] is not in the [index]'

1 Answers1