0

I would like to convert the factors in my dataframe to numerical values i would choose myself instead of the encoded labels. Does anyone know how to do this? For exaample lets say my variable race is encoded as 1 when black and as 2 when white, and when i convert it as.numeric(race) i would like to set 0 when black and 1 when white?

Lola1993
  • 151
  • 6
  • If you need specific numbers, try the `dplyr::recode` function. You can't change the numbers that factor uses with `as.numeric` (they have to start at 1 and go up by 1 in order of the levels.) It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 26 '20 at 05:42
  • Just substract `1` – Clemsang Aug 26 '20 at 06:33

1 Answers1

0

Great question! I do a fair bit of statistical consulting and often find myself cleaning datasets requiring variables recodes.

Here is a function I created that handles both vector and dataframe inputs

Recode <- function(var, ...){
  x <- list(...)
  if(!is.null(dim(var))){temp <- apply(var, MARGIN = 2, as.character)}else{temp <- as.character(var)}
  
  for(i in 1:length(x)){
      label <- names(x)[i]
      levels <- x[[i]]
      temp[temp %in% levels] <- label
  }

  temp
}

Here is an example of a vector input:

Colour <- c("red", "red", "blue", "blue", "green", "yellow", "white", "black", "yellow")

Recode(Colour, "1" = c("red", "blue", "green"), "2" = c("yellow", "white", "black")) %>% as.numeric
[1] 1 1 1 1 1 1 2 2 2 2

Here is an example of a dataframe input

dat <- data.frame(Day = day.name, Fruit = fruit[1:7])
dat
        Day       Fruit
1    Monday       apple
2   Tuesday     apricot
3 Wednesday     avocado
4  Thursday      banana
5    Friday bell pepper
6  Saturday    bilberry
7    Sunday  blackberry

Recode(dat, "1" = c("Monday", "Tuesday", "banana", "bell pepper"), "2" = "Friday")
     Day         Fruit       
[1,] "1"         "apple"     
[2,] "1"         "apricot"   
[3,] "Wednesday" "avocado"   
[4,] "Thursday"  "1"         
[5,] "2"         "1"         
[6,] "Saturday"  "bilberry"  
[7,] "Sunday"    "blackberry"

One perk of this function is that any number of arguments can be passed to the function. Eg

Recode(dat,  'a' = ....., 'b' = ....., 'c' = ....., 'd' = .....)

Furthermore as noted in the previous example the recodes don't have to be one-to-one

I hope this helps! There are probably better solutions, but this has been functional for me at least

statnewb
  • 16
  • 2