Why does `idxmax` throw an error when I try to find the column name which has the maximum value for each row?

Question

I'm working with the UK parliamentary election dataset from researchbriefings.parliament.uk, which for some reason has thought it worth including the vote share as a proportion as well as a number, but not worth including which party actually won in each constituency.

But with pandas that should be simple to swiftly calculate, if we use pandas.DataFrame.idxmax (as recommended in this question)...

-- However, my code refuses to work and I'm not sure why.

Here's the code:

import pandas as pd

# read in the file
elections = pd.read_csv(os.path.join(sys.path[0], '1918-2017election_results.csv'), encoding='cp1252')

# remove whitespace from column names
elections.rename(columns=lambda x: x.strip(), inplace=True)

# find winning number of votes for each constituency
parties = ['con','lib','lab','natSW','oth']
elections['winning_votes'] = elections[(f'{party}_votes' for party in parties)].max(axis=1)

# find winning party for each constituency
elections['winning_party'] = elections[(f'{party}_votes' for party in parties)].idxmax(axis=1)

and this is the error I get: TypeError: reduction operation 'argmax' not allowed for this dtype

Please tell me what I'm doing wrong or how I can use an alternative pythonic method of finding the winning party for each constituency. Thanks!

What is the output of `every_ge.dtypes`? It seems like at least some of them are non-numeric. — Peter Leimbigler, Feb 27 '20 at 03:39

score 1 · Accepted Answer · answered Feb 27 '20 at 03:47

Because some columns were string, not float:

>>> parties = pd.Series(['con','lib','lab','natSW','oth'])
>>> elections[parties + '_votes'].dtypes

con_votes      float64
lib_votes       object
lab_votes       object
natSW_votes    float64
oth_votes      float64
dtype: object

Some rows contain a space, so pandas think the column is string. Convert them to float first:

>>> for p in parties:
>>>     elections[p + '_votes'] = pd.to_numeric(elections[p + '_votes'], errors='coerce')

Then it works as expected:

>>> elections[parties + '_votes'].idxmax(axis=1)

0        oth_votes
1        con_votes
2        con_votes
3        oth_votes
4        oth_votes
           ...    
17043    lab_votes
17044    lab_votes
17045    con_votes
17046    lab_votes
17047    lab_votes
Length: 17048, dtype: object

Why does `idxmax` throw an error when I try to find the column name which has the maximum value for each row?

1 Answers1