1

trying to prepare this data in a specific format

import pandas as pd

voting = pd.read_json("GE2000.json")
voting.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True)

print(voting)

this then returns

                                            vote
county_fips candidate_name  pty vote_pct
2000        Howard Phillips CS  0            596
            John Hagelin    NL  0            919
            Harry Browne    LB  1           2636
            George W. Bush  R   59        167398
            Al Gore         D   28         79004
1001        Howard Phillips I   0              9
            John Hagelin    I   0              5
            Harry Browne    LB  0             51
            George W. Bush  R   70         11993
            Al Gore         D   29          4942

after this, i would like to sort vote_pct and grab the largest, like this (i've tried sort_values, sort_index, etc and can't get it to yield the desired output)

                                            vote
county_fips candidate_name  pty vote_pct
2000        George W. Bush  R   59        167398
1001        George W. Bush  R   70         11993

here is the sample data

[

  {
    "office" : "PRESIDENT",
    "county_name" : "Alaska",
    "vote_pct" : "0",
    "county_fips" : "2000",
    "pty" : "CS",
    "candidate_name" : "Howard Phillips",
  },
  {
    "office" : "PRESIDENT",
    "county_name" : "Alaska",
    "vote_pct" : "0",
    "county_fips" : "2000",
    "pty" : "NL",
    "candidate_name" : "John Hagelin",
  }
]

That data continues

sn4ke
  • 587
  • 1
  • 14
  • 30

2 Answers2

2

You can get the maximum for each using groupby and apply before doing set_index and then set the index afterwards. This allows you to use groupby on columns instead of on the indices (which gets weird):

voting = pd.read_json("GE2000.json")

get_largest_vote_pct = lambda row: row[row.vote_pct == row.vote_pct.max()]

largest = voting.groupby('county_fips').apply(get_largest_vote_pct)

largest.set_index(['county_fips','candidate_name','pty','vote_pct'],inplace=True) 

print(largest)

                                           vote
county_fips candidate_name pty vote_pct        
1001        George W. Bush R   70         11993
2000        George W. Bush R   59        167398
bunji
  • 5,063
  • 1
  • 17
  • 36
0

You can use groupby for example voting.groupby('county_fips')['candidate_name'].max().

There is also more detailed answer here: Python : Getting the Row which has the max value in groups using groupby

Community
  • 1
  • 1
Alex G Rice
  • 1,561
  • 11
  • 16