1

I'm trying to create a function to recode variables into a new variable, based on the coding provided in one variable, and the skip pattern in another.

I've created a trivial example:

data <- data.frame(A=1:4, B=c(1,1,1,2))

My function is as follows:

recode_4scale <- function (var, name, skip, df){
  df$name <- df$var #generate new variable
  df[which(df$skip==2),"name"] <- 5 #replace with 5 if skip pattern
  df[is.na(df$var),"name"] <- 6 #replace with 6 if missing
  df$name <- df$name == 3 | df$name==4 #code as true if 3 or 4
  df$name <- as.factor(df$name)
  return (df)
}
data1<-recode_4scale(A, new, B, data)

I get: Warning message: In is.na(df$var) : is.na() applied to non-(list or vector) of type 'NULL'

What I expect to get by running it line by line:

data$new <- data$A
data[which(data$B==2),"new"] <- 5
data[is.na(data$A),"new"] <- 6
data$new <- data$new == 3 | data$new == 4
data$new <- as.factor(data$new)
data$new
[1] FALSE FALSE TRUE  FALSE
Levels: FALSE TRUE

I believe I'm having trouble passing the names in, given that I can't get anything out of even the most simple function.

Any idea what's going wrong here? (I also know that this is not the best way to write this thing in general, new employee fixing old code, will improve it once I get it running)

jdhd
  • 13
  • 2

1 Answers1

1

object$variable does not do a substitution on variable. Rather it assumes there is something already called variable (not the value of variable, but the actual string "variable") in your object. However, the following will work:

data <- data.frame(A=1:4, B=c(1,1,1,2))
variable <- "A"
data[[variable]]  # Same as df[["A"]] or df$A
# [1] 1 2 3 4

So, your function should be:

recode_4scale <- function (var, name, skip, df){
  df[[name]] <- df[[var]] #generate new variable
  df[which(df[[skip]]==2), name] <- 5 #replace with 5 if skip pattern
  df[is.na(df[[var]]), name] <- 6 #replace with 6 if missing
  df[[name]] <- df[[name]] == 3 | df[[name]] == 4 #code as true if 3 or 4
  df[[name]] <- as.factor(df[[name]])
  return (df)
}
data1 <- recode_4scale("A", "new", "B", data)
data1
#   A B   new
# 1 1 1 FALSE
# 2 2 1 FALSE
# 3 3 1  TRUE
# 4 4 2 FALSE
Alexey Shiklomanov
  • 1,592
  • 13
  • 23
  • thanks so much! this is exactly what I need. – jdhd Dec 18 '17 at 20:56
  • Great! If this answers your question, please upvote the answer and mark it as the answer (click the gray "check" mark). – Alexey Shiklomanov Dec 18 '17 at 20:57
  • Wait @Alexey Shiklomanov, one issue--I'm not getting the right value for row 4: it should be "FALSE" (because B=2). I think it's generating a new "name" variable instead of replacing data$new with 5, which is what happens when I run it manually. – jdhd Dec 18 '17 at 21:38
  • If that's what you're trying to do, that line of the function should probably be: `df[[name]] <- df[[skip]] == 3 | df[[skip]] == 4`, because you are comparing against the value of the `skip` column...? – Alexey Shiklomanov Dec 18 '17 at 21:42
  • What I'm really trying to do is add things that are missing (or should be but aren't) because of the skip pattern into the "false" column. I don't think that line gets me there unless I'm misunderstanding. – jdhd Dec 18 '17 at 21:46
  • Here's the step by step tracking if I do it manually: > data$new <- data$A > table (data$new) 1 2 3 4 1 1 1 1 > data[which(data$B==2),"new"] <- 5 > table (data$new) 1 2 3 5 1 1 1 1 > data[is.na(data$A),"new"] <- 6 > table (data$new) 1 2 3 5 1 1 1 1 > data$new <- data$new == 3 | data$new == 4 > table (data$new) FALSE TRUE 3 1 > data$new <- as.factor(data$new) > table (data$new) FALSE TRUE 3 1 – jdhd Dec 18 '17 at 21:47
  • I see. I think my revisions did what you're looking for. – Alexey Shiklomanov Dec 18 '17 at 21:51
  • Yes! that fixes it. – jdhd Dec 18 '17 at 21:54