0

I have two data frames that look like so

> df1
county   state   code
ANDERSON Texas   1      
ANDREWS  Texas   2
ANGELINA Texas   3
....

> df2
county   state   citations  year
ANDERSON Texas   124        2011
ANDREWS  Texas   32         2011
ANGELINA Texas   491        2011
....

I have tried to merge the two of these a few different ways:

merge <- full_join(df1, df2, by = c("county", "state"))
merge <- merge(df1, df2, by = c("county", "state"))

In both cases, I receive the following warning:

Warning message:
Column `county` joining factor and character vector, coercing into
character vector

The resulting data frame does not have any data for df2, even after coercing the factor into a character. I tried it again after turning the county column into a character in both data frames and still have issues.

Here are the heads of the two data frames I am attempting to merge:

> dput(head(data))
structure(list(year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L
), month = c(1L, 1L, 1L, 1L, 1L, 1L), county = c("ANDERSON COUNTY", 
"ANGELINA COUNTY", "ARANSAS COUNTY", "ATASCOSA COUNTY", "BASTROP COUNTY", 
"BELL COUNTY"), state = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Montana", 
"Texas"), class = "factor"), citations = c(218L, 422L, 55L, 472L, 
745L, 1403L), warnings = c(521L, 711L, 124L, 1173L, 819L, 2242L
), population = c(56760L, 82812L, 24721L, 43589L, 72248L, 276975L
), d_revenue = c(-736L, -6723L, 1134L, 71L, 2308L, 852L), crashes = c(73L, 
133L, 18L, 71L, 95L, 422L), density = c(55, 108.8, 91.9, 36.8, 
83.5, 295.2), unemp_rate = c(8, 8.3, 9.6, 8.5, 8.5, 8), stops = 
c(739L, 1133L, 179L, 1645L, 1564L, 3645L), stops_per_cap = c(0.013019732, 
0.013681592, 0.007240807, 0.037738879, 0.021647658, 0.013160032
), crashes_per_cap = c(0.001286117, 0.001606047, 0.000728126, 
0.001628851, 0.001314915, 0.001523603)), .Names = c("year", "month", 
"county", "state", "citations", "warnings", "population", "d_revenue", 
"crashes", "density", "unemp_rate", "stops", "stops_per_cap", 
"crashes_per_cap"), row.names = c(NA, 6L), class = "data.frame")

> dput(head(codes))
structure(list(county = c("ANDERSON  COUNTY ", "ANDREWS  COUNTY ", 
"ANGELINA  COUNTY ", "ARANSAS  COUNTY ", "ARCHER  COUNTY ", "ARMSTRONG  COUNTY "
), state = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Montana", 
"Texas"), class = "factor"), code = 1:6), .Names = c("county", 
"state", "code"), row.names = c(NA, 6L), class = "data.frame")
M. Damon
  • 31
  • 8
  • Well, it sounds like you are trying to join character and factors. Have you checked the class of all the columns you are doing? We can't tell from your raw data how you imported these values. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with a `dput()` of your data so we know how it was actually imported into R. Perhaps you can avoid factors altogether. – MrFlick Mar 25 '19 at 20:45
  • 1
    when making your initial data frame OR when reading in .csv or .txt data try: stringsAsFactors=FALSE – Andrew Bannerman Mar 25 '19 at 20:57
  • Not sure if this is related, but you're missing a closing quotation mark in `c("county", "state)` – camille Mar 25 '19 at 21:05
  • Would it make a difference to use `df2$county <- as.character(df2$county)` compared to using the stringsAsFactors command you mentioned Andrew? – M. Damon Mar 26 '19 at 00:57
  • Also, sorry I had never known about using dput, but thank you for mentioning because that will be very helpful for questions like this. I will edit my question to include this. – M. Damon Mar 26 '19 at 00:58
  • 1
    Yes, it’s a warning, not an error, since you usually want to match factors as if they were characters, but the safe thing to do is make that explicit by using `as.character` – divibisan Mar 26 '19 at 01:09
  • I tried both using `as.character` and reading it in with the `stringsAsFactors=FALSE` and both ways still do not result in a correct merge. – M. Damon Mar 26 '19 at 01:15
  • Please show don't tell how you are *coercing*. – Parfait Mar 26 '19 at 01:46
  • I use `data$county <- as.character(data$county)` and `codes$county <- as.character(codes$county)`. Then when trying either `merge <- merge(data, codes, by = c("state", "county")` or `merge <- full_join(data, codes, by = c("state", "county")`, they both return a `merge` data frame with missing information from one of the initial data frames. – M. Damon Mar 26 '19 at 02:34

0 Answers0