0

I am trying to merge or join two dataframes in R but I have tried every combination of both functions and neither seem to be producing the desired result. Both data frames share a column named "CTYNAME" which is a char in both dataframes. Both dfs have a row with 'Butler County' for example, but even when using full_join the final product has a row with only Butler County and a row with Butler County and all of the data from the second data frame, instead of just combining the two rows.

Essentially, there is unique overlap in the 'CTYNAME' column 19 times so I expect a 19 column output but I can only get 0 (using inner_join) or 107 (the sum of columns in both dfs, using full_join).

What I have currently

frank_stops = 
  frank_stops %>%
  filter(State == "Ohio") %>%
  select(4) %>% 
  distinct() %>%
  rename(CTYNAME = 'County') %>% 
  mutate(CTYNAME = paste(CTYNAME, " County")) %>% 
  data.frame()
frank_stops$CTYNAME <- as.character(frank_stops$CTYNAME)


frank_stops


ohio_data = 
  ohio_data %>% 
  rename(CTYNAME = 'CTYNAME') %>%
  filter(AGEGRP == 0, YEAR == 12) %>% 
  data.frame()
ohio_data$CTYNAME <- as.character(ohio_data$CTYNAME)

ohio_data


together = full_join(frank_stops, ohio_data, by="CTYNAME")
together

Console Output:

Rows: 20064 Columns: 80── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): SUMLEV, COUNTY, STNAME, CTYNAME
dbl (76): STATE, YEAR, AGEGRP, TOT_POP, TOT_MALE, TOT_FEMALE, WA_MALE, WA_FEMALE, BA_MALE, BA_FEMALE, IA_MALE, IA_FEM...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Rows: 153 Columns: 4── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Date of Visit, Town/City, State, County
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Joining, by = "CTYNAME"
zephryl
  • 14,633
  • 3
  • 11
  • 30
Jaxon
  • 3
  • 1
  • 1
    What is the value of `sum(frank_stops$CTYNAME == "Butler County")` and `sum(ohio_data$CTYNAME == "Butler County")`? When you are doing a `full_join`, repeated keys in the two datasets lead to repeated rows in the result (if a given key appears p times in one and q times in the other, all combinations appear in the output, so there are p*q rows with this key). It's not entirely clear from your question but the problem may come from this. Or maybe the keys are not exactly identical (extra space...). Have you also tried inner/left/right joins? –  Dec 13 '22 at 08:11
  • Hi Jason, welcome to the stack! Do take the [tour] if you haven't already. You're much likely to get a useful answer to this question if you can provide a [minimal, reprodubile example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Captain Hat Dec 13 '22 at 09:44

1 Answers1

0

I had a similar issue adding str_squish() on the join key in both data frames prior to the join resolved it.

frank_stops$CTYNAME <-str_squish(frank_stops$CTYNAME) 
ohio_data$CTYNAME<- str_squish(ohio_data$CTYNAME)

will resolve the issue if you have uneven spacing an if that is impacting the join