I would like to convert the factors in my dataframe to numerical values i would choose myself instead of the encoded labels. Does anyone know how to do this? For exaample lets say my variable race is encoded as 1 when black and as 2 when white, and when i convert it as.numeric(race) i would like to set 0 when black and 1 when white?
Asked
Active
Viewed 324 times
0
-
If you need specific numbers, try the `dplyr::recode` function. You can't change the numbers that factor uses with `as.numeric` (they have to start at 1 and go up by 1 in order of the levels.) It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 26 '20 at 05:42
-
Just substract `1` – Clemsang Aug 26 '20 at 06:33
1 Answers
0
Great question! I do a fair bit of statistical consulting and often find myself cleaning datasets requiring variables recodes.
Here is a function I created that handles both vector and dataframe inputs
Recode <- function(var, ...){
x <- list(...)
if(!is.null(dim(var))){temp <- apply(var, MARGIN = 2, as.character)}else{temp <- as.character(var)}
for(i in 1:length(x)){
label <- names(x)[i]
levels <- x[[i]]
temp[temp %in% levels] <- label
}
temp
}
Here is an example of a vector input:
Colour <- c("red", "red", "blue", "blue", "green", "yellow", "white", "black", "yellow")
Recode(Colour, "1" = c("red", "blue", "green"), "2" = c("yellow", "white", "black")) %>% as.numeric
[1] 1 1 1 1 1 1 2 2 2 2
Here is an example of a dataframe input
dat <- data.frame(Day = day.name, Fruit = fruit[1:7])
dat
Day Fruit
1 Monday apple
2 Tuesday apricot
3 Wednesday avocado
4 Thursday banana
5 Friday bell pepper
6 Saturday bilberry
7 Sunday blackberry
Recode(dat, "1" = c("Monday", "Tuesday", "banana", "bell pepper"), "2" = "Friday")
Day Fruit
[1,] "1" "apple"
[2,] "1" "apricot"
[3,] "Wednesday" "avocado"
[4,] "Thursday" "1"
[5,] "2" "1"
[6,] "Saturday" "bilberry"
[7,] "Sunday" "blackberry"
One perk of this function is that any number of arguments can be passed to the function. Eg
Recode(dat, 'a' = ....., 'b' = ....., 'c' = ....., 'd' = .....)
Furthermore as noted in the previous example the recodes don't have to be one-to-one
I hope this helps! There are probably better solutions, but this has been functional for me at least

statnewb
- 16
- 2