5

I think this is the best way to describe what I want to do:

df$column <- ifelse(is.na(df$column) == TRUE, 0, 1)

But where column is dynamic. This is because I have about 45 columns all with the same kind of content, and all I want to do is check each cell, replace it with a 1 if there's something in it, a 0 if not. I have of course tried many different things, but since there seems to be no df[index][column] in R, I'm lost. I'd have expected something like this to work, but nope:

for (index in df) {
  for (column in names(df)) {
    df[[index]][[column]] <- ifelse(is.na(df[[index]][[column]]) == TRUE, 0, 1)
  }
}

I could do this quickly in other languages (or even Excel), but I'm just learning R and want to understand why something so simple seems to be so complicated in a language that's meant to work with data. Thanks!

Keith Collins
  • 252
  • 1
  • 3
  • 9

1 Answers1

7

How about this:

df.new = as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, 1)))

lapply applies a function to each column of the data frame df. In this case, the function does the 0/1 replacement. lapply returns a list. Wrapping it in as.data.frame converts the list to a data frame (which is a special type of list).

In R you can often replace a loop with one of the *apply family of functions. In this case, lapply "loops" over the columns of the data frame. Also, many R functions are "vectorized" meaning the function operates on every value in a vector at once. In this case, ifelse does the replacement on an entire column of the data frame.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Perfect, thanks! This is exactly what I needed. And just wondering, is there any way to check which column or row you're on while inside the function? Or would the standard practice be to first select the data you want to manipulate and use lapply on that? Thanks again. – Keith Collins May 07 '15 at 14:15
  • Ah, got it: df[2:44] = as.data.frame(lapply(df[2:44], function(x) ifelse(is.na(x), 0, 1))) – Keith Collins May 07 '15 at 15:08