Here is a fairly minimal reproducing code. The real dataset is larger and has many factors, so manually listing factors is not practical. There are also more interesting transformations on the data, for which I want to keep using dplyr.
library(dplyr)
a = data.frame(f=factor(c("a", "b")), g=c("a", "a"))
b = data.frame(f=factor(c("a", "c")), g=c("a", "a"))
a = a %>% group_by(g) %>% mutate(n=1)
b = b %>% group_by(g) %>% mutate(n=2)
rbind(a,b)
This produces:
# A tibble: 4 x 3
# Groups: g [1]
f g n
<chr> <fctr> <dbl>
1 a a 1
2 b a 1
3 a a 2
4 c a 2
Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
2: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
3: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
These warnings are annoying, and would actually disappear if I did not use the group_by
:
> a = data.frame(f=factor(c("a", "b")), g=c("a", "a"))
> b = data.frame(f=factor(c("a", "c")), g=c("a", "a"))
> a = a %>% mutate(n=1)
> b = b %>% mutate(n=2)
> rbind(a,b)
f g n
1 a a 1
2 b a 1
3 a a 2
4 c a 2
Explicitly converting to data.frame
just before rbind
also works:
> rbind(data.frame(a),data.frame(b))
f g n
1 a a 1
2 b a 1
3 a a 2
4 c a 2
Is there an easy way with base R or dplyr rbind
/bind_rows
to automatically merge those factors and their levels instead of converting them to character (which makes little sense to me), while still using dplyr for data transformations?
I found https://stackoverflow.com/a/30468468/388803 which proposes a solution to merge the factors manually, but this is very verbose.
My actual use-case is loading two .csv files with read.table
, doing some data transformations and then merging the data as they are complementary.
My current workaround is to call data.frame(data)
at the end of the end of data transformations.
I wonder why dplyr/tibble does not automatically merge factors as it seems safe in such a situation. Is this something that could be improved maybe?