0

I am having difficulty merging two datasets; both from the FEC. When I attempt to right-merge I get a many-to-many relationship and the merged dataset keeps duplicating numbers in all the wrong numbers.

What I tried:

# 2014 US House Election Results
Election_2014 <- read_excel("2014 US House Election by State.xlsx", na = c("", "NA"))

# 2018 US House Election Results
Election_2018 <- read_excel("2018 US House Election by State.xlsx", na = c("", "NA"))

# 2014 and 2018 Election Results
Election_14.18 <- Election_2014 %>% right_join(Election_2018, by=c("STATE", "DISTRICT", "PARTY"))

What I wanted to happen:

# Election_2014

`1` `STATE ABBREVIATION` STATE   DISTRICT `(I)` `TOTAL VOTES`   PARTY `GENERAL VOTES` `GENERAL %` 
1     2 AL                   Alabama 01       (I)   NA              R     103758        0.682                          
2     3 AL                   Alabama 01       NA    NA              D     48278         0.317                          
3     4 AL                   Alabama 01       NA    NA              W     198          0.00130                        
4     5 AL                   Alabama 01       NA    District Votes: NA    152234         NA                              
5     6 AL                   Alabama NA       NA    NA              NA    NA             NA                              
6     7 AL                   Alabama 02       (I)   NA              R     113103        0.673                          

to merge with:

# Election_2018
`1` `STATE ABBREVIATION` STATE   DISTRICT `(I)` `TOTAL VOTES`   PARTY `GENERAL VOTES` `GENERAL %` 
1     3 AL                   Alabama 01       (I)   NA              R     153228        0.632                            
2     4 AL                   Alabama 01       NA    NA              D     89226         0.368                            
3     5 AL                   Alabama 01       NA    NA              D     NA             NA                                
4     6 AL                   Alabama 01       NA    Party Votes:    D     NA             NA                                
5     7 AL                   Alabama 01       NA    NA              W     163         0.000672                         
6     8 AL                   Alabama 01       NA    District Votes: NA    242617            NA                                

What I got:

`1.x` `STATE ABBREVIATION.x` STATE   DISTRICT `(I).x` `TOTAL VOTES.x` PARTY `GENERAL VOTES.x` `GENERAL %.x` 
1     2 AL                     Alabama 01       (I)     NA              R     103758                  0.682                                        NA
2     3 AL                     Alabama 01       NA      NA              D     48278                   0.317                                        NA
3     3 AL                     Alabama 01       NA      NA              D     48278                   0.317                                        NA
4     3 AL                     Alabama 01       NA      NA              D     48278                   0.317                                        NA
5     4 AL                     Alabama 01       NA      NA              W     198                     0.00130                                      NA
6     5 AL                     Alabama 01       NA      District Votes: NA    152234                 NA                                            NA

For 12,000 rows :/


I'm wanting just one 'D' and one 'R' per district -- if you can help me get rid of the District votes too, even better. I've only been using R for a month, so any words of advice would be greatly appreciated.

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
D. Robert
  • 11
  • 1

1 Answers1

0

I may have completely misunderstood your question, but it looks like you want to do a full_join() then reframe() your dataframe to combine rows from each dataset by e.g. adding the GENERAL.VOTES together when PARTY, DISTRICT and STATE match.

Does this give you your expected outcome?

library(dplyr)

Election_2014 <- read.table(text = "1 'STATE ABBREVIATION' STATE   DISTRICT '(I)' 'TOTAL VOTES'   PARTY 'GENERAL VOTES' 'GENERAL %'
2 AL                   Alabama 01       (I)   NA              R     103758        0.682                          
3 AL                   Alabama 01       NA    NA              D     48278         0.317                          
4 AL                   Alabama 01       NA    NA              W     198          0.00130                        
5 AL                   Alabama 01       NA    'District Votes:' NA    152234         NA                              
6 AL                   Alabama NA       NA    NA              NA    NA             NA                              
7 AL                   Alabama 02       (I)   NA              R     113103        0.673",
header = TRUE)

Election_2018 <- read.table(text = "'1' 'STATE ABBREVIATION' STATE   DISTRICT '(I)' 'TOTAL VOTES'   PARTY 'GENERAL VOTES' 'GENERAL %' 
3 AL                   Alabama 01       (I)   NA              R     153228        0.632                            
4 AL                   Alabama 01       NA    NA              D     89226         0.368                            
5 AL                   Alabama 01       NA    NA              D     NA             NA                                
6 AL                   Alabama 01       NA    'Party Votes:'    D     NA             NA                                
7 AL                   Alabama 01       NA    NA              W     163         0.000672                         
8 AL                   Alabama 01       NA    'District Votes:' NA    242617            NA ",
header = TRUE)

Election_2014 %>%
  full_join(Election_2018) %>%
  reframe(General_votes = sum(GENERAL.VOTES, na.rm = TRUE),
          `(I)` = unique(X.I.),
          .by = c("STATE", "DISTRICT", "PARTY"))
#> Joining with `by = join_by(X1, STATE.ABBREVIATION, STATE, DISTRICT, X.I.,
#> TOTAL.VOTES, PARTY, GENERAL.VOTES, GENERAL..)`
#>     STATE DISTRICT PARTY General_votes  (I)
#> 1 Alabama        1     R        256986  (I)
#> 2 Alabama        1     D        137504 <NA>
#> 3 Alabama        1     W           361 <NA>
#> 4 Alabama        1  <NA>        394851 <NA>
#> 5 Alabama       NA  <NA>             0 <NA>
#> 6 Alabama        2     R        113103  (I)

Created on 2023-07-05 with reprex v2.0.2

NB. In future, please provide a sample of your data in a format that can be easily imported into R (using e.g. dput(head(Election_2014)) and dput(head(Election_2018))) so we can test out potential solutions and don't need to try and recreate the data from your tables. This, and the lack of a clear expected outcome using your example data, are very likely why your question was downvoted. See How to make a great R reproducible example and https://stackoverflow.com/help/minimal-reproducible-example for more info.

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46