I have a dataset of US House of Representatives races. While there are 435 districts, I have 439 Democratic candidates, which is too many, and I'm trying to figure out why. I suspect there are runoff races causing this, which I want to test.
>>> democrat_results.head()
states po dist cand party cand_votes tot_votes
0 ALABAMA AL 1 ROBERT KENNEDY JR. DEMOCRAT 89226 242617
4 ALABAMA AL 2 TABITHA ISNER DEMOCRAT 86931 226230
7 ALABAMA AL 3 MALLORY HAGAN DEMOCRAT 83996 231915
10 ALABAMA AL 4 LEE AUMAN DEMOCRAT 46492 230969
13 ALABAMA AL 5 PETER JOFFRION DEMOCRAT 101388 260673
What I'm trying to do is see if any of the state districs (eg. AL 1, AL 2) have two listings. I can figure out how to do this on my own, but the problem I'm having is that whenever I write a for loop to act on the dataframe, it seems to just act on the column headers.
unique_races = []
for row in democrat_results[1:]:
if row not in unique_races:
unique_races.append(row)
# this was "row" and "race", now has been changed to just be row in both cases
unique_races
returns:
['states', 'po', 'dist', 'cand', 'party', 'cand_votes', 'tot_votes']
(I am aware that the loop won't do what I'm looking for, it's just to demonstrate what happens)
How do I avoid this and instead have the for loop act on the rows?
I am aware that for loops are inefficient, but I'm only using a few hundred values and do not know more advanced methods, making it suitable enough for me.
Rest assured that I have spent time looking for an answer, and not found one, leading to me asking this question.