1

I have a range of string values in a factor which I'd like to re-code. Within the levels, there's a long range of factor levels ("601", "602",...,"689") that I want to re-code to a single numeric value 5001.

I tried dplyr using mutate in combination with case_when as illustrated. These codes work for single values, but I don't know how to include a re-code for a range of string values without resorting by line.

basecensusdata <- basecensusdata %>% 
  mutate(educval,  case_when(
  basecensusdata$P12 == "000" ~ 0,
  basecensusdata$P12 == "010" ~ 100))

I'd like to re-code the range ("601" to "689") into a singular numeric value under a new variable (say new_var). How can this be done?

MDEWITT
  • 2,338
  • 2
  • 12
  • 23
  • Hi @marktacderas, welcome to StackOverflow! If you can post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that will help others try to answer as well. Can you do ```dput(head(df,10))``` (where ```df``` is the name of your dataframe and replace 10 with the number of lines of your dataframe you want to include? Also, if you can show what you want your expected output to look like that is also helpful. – Russ Thomas Sep 01 '19 at 14:59

3 Answers3

0

You could create a range of values to compare and replace them with the number you want. Consider an example where you want to update values from 3 to 5 with 5001.

df <- data.frame(a = factor(1:10), b = letters[1:10])
df$new_var <- as.character(df$a)
df$new_var[df$a %in% 3:5] <- 5001

df
#    a b new_var
#1   1 a       1
#2   2 b       2
#3   3 c    5001
#4   4 d    5001
#5   5 e    5001
#6   6 f       6
#7   7 g       7
#8   8 h       8
#9   9 i       9
#10 10 j      10
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

As it is a factor column we can change the levels

df$new_var <- df$Col
levels(df$new_var)[levels(df$new_var) %in% as.character(601:689)] <- "5001"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Thanks for the suggestions. I was actually able to find an answer before I got to read these. Here's my solution:

First, I made a proxy variable just for codes. df$factor2_num <- as.numeric(as.character(df$factor))

Then in my "case_when" statement, I put the following:

if((...case_when... (df$factor_num >=601) & (df$factor_num <= 689) ~ 5953 ...

Which worked perfectly. It's in line with all the solutions here. Thanks!