0

I am trying to build a function as part of a larger function in R. Some of the pieces are working fine but others are not. Here is the piece of the code that is giving me issues.

This part of the function is designed to identify if a variable in a dataframe is missing, then generate a new variable which records if that specific case is missing or present. I want the new variable to have the suffix .zero (q1 becomes q1_zero, q2 becomes q2_zero, etc.). I can generate the suffix without any issues. Creating the new variable is causing some problems. Any insight would be greatly appreciated.

function1 <- function (x, data) {
  # new variable name
  temp <- paste (x, .zero, sep="", collapse = NULL)
  temp
  
  # is variable missing
  # I don't know if I should use this method or ifelse()
  data$temp [is.na (data$x)]<- 0
  data$temp [!is.na (data$x)]<- 1
 return (data$temp)
  }
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214

1 Answers1

5

You've got a few issues

  • .zero isn't defined, you want the quoted string ".zero"
  • You can't use $ with column names stored in strings. You need to use data[[temp]] not data$temp. Here's the related FAQ if you want to read more.
  • You probably want to return the whole modified data frame, not just the column you added (I'm assuming this since you passed the whole data frame in to the function).

We can also make some simplifications, paste0() is a shortcut for paste(sep = "") and as.integer(!is.na(data$x)) is a cleaner and more efficient way to create your values.

Putting this all together:

function1 <- function (x, data) {
  data[[paste0(x, ".zero")]] = as.integer(!is.na(data[[x]]))
  return(data)
}

I'd add a little commentary to say that the .zero suffix is not particularly informative for whether or not a value is missing. A better suffix might be something like .present -- a 1 indicates the value is present, a 0 indicates it is not.

Similarly, function1 is an absolutely terrible name for a function. Use descriptive names. add_present_column would be a much better name. (It's often nice to give functions names that are verbs.)

Since I see Konrad editing the question, I'll also mention that return() isn't needed in R functions. The last line of the function will be returned, and stylistically many would prefer that the last line of the function just be data not return(data).

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294