28

When using the join function in the dplyr package, I get this warning:

Warning message:
In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector

There is not a lot of information online about this. Any idea what it could be? Thanks!

coip
  • 1,312
  • 16
  • 30
Christopher Yee
  • 535
  • 2
  • 5
  • 14
  • 3
    Can you post the code that is producing the error? – Tor May 26 '15 at 20:40
  • 3
    With errors like these, it helps to post a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick May 26 '15 at 20:40

3 Answers3

46

That's not an error, that's a warning. And it's telling you that one of the columns you used in your join was a factor and that factor had different levels in the different datasets. In order not to lose any information, the factors were converted to character values. For example:

library(dplyr)
x<-data.frame(a=letters[1:7])
y<-data.frame(a=letters[4:10])

class(x$a) 
# [1] "factor"

# NOTE these are different
levels(x$a)
# [1] "a" "b" "c" "d" "e" "f" "g"
levels(y$a)
# [1] "d" "e" "f" "g" "h" "i" "j"

m <- left_join(x,y)
# Joining by: "a"
# Warning message:
# joining factors with different levels, coercing to character vector 

class(m$a)
# [1] "character"

You can make sure that both factors have the same levels before merging

combined <- sort(union(levels(x$a), levels(y$a)))
n <- left_join(mutate(x, a=factor(a, levels=combined)),
    mutate(y, a=factor(a, levels=combined)))
# Joining by: "a"
class(n$a)
#[1] "factor"
dfrankow
  • 20,191
  • 41
  • 152
  • 214
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    sounds good, does that mean the code is good to go? Or are there some tweaks i need to make to get rid of the warning? – Christopher Yee May 26 '15 at 20:40
  • Well, you can make sure the factors have the same levels in each data set prior to the join. – MrFlick May 26 '15 at 20:41
  • 1
    they shouldnt have the same number of factors. I am basically joining a procedure table of about 20,000 codes to to a data set that may not have all those procedure codes present. – Christopher Yee May 26 '15 at 20:43
  • @EgorUfimtsev, instead of editing you should perhaps ask the answerer for clarification by a comment (concerning whether `left_join(mutate(x,` is correct, or it should read `left_join(mutate(y,` instead)... – aschipfl Aug 23 '16 at 18:30
  • 1
    Just reconvert to a factor once you have carried out the join. As @MrFlick said, it is a warning not an error. – Seanosapien Aug 24 '17 at 15:47
5

This warning message will also appear if the joining columns in the two tables have different level orders;

tb1 <- data_frame(a = c("a","b","c")) %>% mutate(a=as.factor(a))
# Change level order of table tb2's col a
tb2 <- tb1 %>% mutate(a = fct_relevel(a,"c"))

# Check both still factors
tb1$a %>% class()
[1] "factor"
tb2$a %>% class()
[1] "factor"

# Check level order
tb1$a %>% levels()
[1] "a" "b" "c"
tb2$a %>% levels()
[1] "c" "a" "b"

# Try joining
tb1 %>% left_join(tb2)
Joining, by = "a"
Column `a` joining factors with different levels, coercing to character vector
EcologyTom
  • 2,344
  • 2
  • 27
  • 38
Jiaxiang
  • 865
  • 12
  • 23
  • This was helpful. It seems the warning message is irrelevant in this case, the level orders shouldn't make a difference to the joining. – zola25 Dec 14 '18 at 14:28
  • @zola25 yes, the order is irrelavant, and this is one of examples with the kind of warning. – Jiaxiang Dec 18 '18 at 05:25
2

In case of database, don't forget stringsAsFactors=FALSE in many cases, to avoid this warning. (It's was my case).

sqlExecute(my_database_channel, data=myparam, stringsAsFactors=FALSE )
phili_b
  • 885
  • 9
  • 27