The title is a little garbled, but I'm not sure how else to describe it. I'm coming from Stata so still getting the hang of factors.
Basically, I want to be able to assign factor levels and labels, but any that I miss get assigned as a default level/label.
Take the following:
library(dplyr)
dt <- as.data.frame(mtcars) # load demo data
dt$carb[4:6] <- NA # set some rows to NA for example
dt <- dt%>%
mutate(
carb_f = factor(carb,
levels = c(1,2,3,4),
labels = c("One","Two","Three","Four")
)
)
table(dt$carb, dt$carb_f, exclude=NULL)
which yields the following:
One Two Three Four <NA>
1 5 0 0 0 0
2 0 9 0 0 0
3 0 0 3 0 0
4 0 0 0 10 0
6 0 0 0 0 1
8 0 0 0 0 1
<NA> 0 0 0 0 3
The unstated 6
and 8
are set to NA
in the resultant factor carb_f
. Although this is expected behaviour, I want to be able to request something like this:
dt <- dt%>%
mutate(
carb_f = factor(carb,
levels = c(1,2,3,4),
labels = c("One","Two","Three","Four"),
non-na(10,"Unk") # obvious pseudocode
)
)
to yield this:
One Two Three Four Unk <NA>
1 5 0 0 0 0 0
2 0 9 0 0 0 0
3 0 0 3 0 0 0
4 0 0 0 10 0 0
6 0 0 0 0 1 0
8 0 0 0 0 1 0
<NA> 0 0 0 0 0 3
...where the unstated 6
and 8
are assigned to a default level/label of 10
and Unk
, but the true NA
remain NA
.
Is there a way of handling this without explicitly referencing 6
and 8
?