1

I'm working with the UK parliamentary election dataset from researchbriefings.parliament.uk, which for some reason has thought it worth including the vote share as a proportion as well as a number, but not worth including which party actually won in each constituency.

But with pandas that should be simple to swiftly calculate, if we use pandas.DataFrame.idxmax (as recommended in this question)...

-- However, my code refuses to work and I'm not sure why.

Here's the code:

import pandas as pd

# read in the file
elections = pd.read_csv(os.path.join(sys.path[0], '1918-2017election_results.csv'), encoding='cp1252')

# remove whitespace from column names
elections.rename(columns=lambda x: x.strip(), inplace=True)

# find winning number of votes for each constituency
parties = ['con','lib','lab','natSW','oth']
elections['winning_votes'] = elections[(f'{party}_votes' for party in parties)].max(axis=1)

# find winning party for each constituency
elections['winning_party'] = elections[(f'{party}_votes' for party in parties)].idxmax(axis=1)

and this is the error I get: TypeError: reduction operation 'argmax' not allowed for this dtype

Please tell me what I'm doing wrong or how I can use an alternative pythonic method of finding the winning party for each constituency. Thanks!

Peter Prescott
  • 768
  • 8
  • 16

1 Answers1

1

Because some columns were string, not float:

>>> parties = pd.Series(['con','lib','lab','natSW','oth'])
>>> elections[parties + '_votes'].dtypes

con_votes      float64
lib_votes       object
lab_votes       object
natSW_votes    float64
oth_votes      float64
dtype: object

Some rows contain a space, so pandas think the column is string. Convert them to float first:

>>> for p in parties:
>>>     elections[p + '_votes'] = pd.to_numeric(elections[p + '_votes'], errors='coerce')

Then it works as expected:

>>> elections[parties + '_votes'].idxmax(axis=1)

0        oth_votes
1        con_votes
2        con_votes
3        oth_votes
4        oth_votes
           ...    
17043    lab_votes
17044    lab_votes
17045    con_votes
17046    lab_votes
17047    lab_votes
Length: 17048, dtype: object
Code Different
  • 90,614
  • 16
  • 144
  • 163