1

Is there a quick way to run multiple arithmetic operations across data frame variables while ignoring cases with NAs? I've put a simple example below.

It seems I could add intermediary variables or 'if' statements but that seems too convoluted.

d1<-c(2,2,2,2)
d2<-c(1,1,1,1)
d3<-c(1,1,NA,NA)

df<-data.frame(d1,d2,d3)
df
  d1 d2 d3
1  2  1  1
2  2  1  1
3  2  1 NA
4  2  1 NA

df$d4<-d1*((d2) + (d3))
df
  d1 d2 d3 d4
1  2  1  1  4
2  2  1  1  4
3  2  1 NA NA
4  2  1 NA NA

What I'd like to get is this:

df2<-data.frame(d1,d2,d3,d4=c(4,4,2,2))
    df2
      d1 d2 d3 d4
    1  2  1  1  4
    2  2  1  1  4
    3  2  1 NA  2
    4  2  1 NA  2

I could replace all values with 0s yet that could also be misleading.

EDIT:

I've tried converting NAs to 0s but it does not work and I don't understand why.

df<-data.frame(d1,d2,d3)
df
df[is.na(df)] <- 0
df
  d1 d2 d3
1  2  1  1
2  2  1  1
3  2  1  0
4  2  1  0
df$d4<-d1*((d2) + (d3))
df
  d1 d2 d3 d4
1  2  1  1  4
2  2  1  1  4
3  2  1  0 NA
4  2  1  0 NA
val
  • 1,629
  • 1
  • 30
  • 56
  • 3
    Could do `d1 * rowSums(df[-1], na.rm = TRUE)` – David Arenburg Jan 25 '17 at 07:44
  • 3
    This is not the logic of NA. It seems you want NA treated as 0, so change the NA-values to 0. (eventually in a copy of your dataframe) – jogo Jan 25 '17 at 07:46
  • @DavidArenburg: i thought of this but it is not general enough if my other variables (the ones in rowSums) are getting multiplied by additional factors or constants. – val Jan 25 '17 at 07:46
  • what do you mean by "ignoring cases with NAs"? do you want the results for those row to be NA as well? – chinsoon12 Jan 25 '17 at 07:55
  • 3
    Regarding your edit: `df$d4<- with(df, d1*((d2) + (d3)))` – Roland Jan 25 '17 at 08:00
  • @jogo: I've tried your suggestion but something is amiss.... – val Jan 25 '17 at 08:00
  • 2
    Regarding your last edit: there is a difference between `d1` and `df$d1`. You modified the latter, while the former stayed unchanged. – nicola Jan 25 '17 at 08:00
  • "I could replace all values with 0s yet that could also be misleading." Yet, that's what you are doing for the calculation of `d4`. If you don't want to mislead, you should accept the `NA` values in `d4`. – Roland Jan 25 '17 at 08:01
  • if someone puts an answer i'll check it. this seems so simple now I wonder if I should take it down. thanks all. – val Jan 25 '17 at 08:04

1 Answers1

2

If you want to change all NAs to 0 you can do:

df<-data.frame(d1=c(2,2,2,2), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
df.new <- as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, x)))

or (thanks to Sotos!):

df[is.na(df)] <- 0  

But be careful: this will work well for dataframes with all columns numeric. In other cases you might face problems. Here is a solution for the case of nonnumeric columns:

df <- data.frame(d1=c(2,2,2,2), dx=c("A", "bb", "C", "DD"), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
numCols <- sapply(df, is.numeric)

df[, numCols][is.na(df[, numCols])] <- 0
df
jogo
  • 12,469
  • 11
  • 37
  • 42
  • 3
    why not simply `df[is.na(df)] <- 0`? – Sotos Jan 25 '17 at 08:07
  • yes - simplification works best as my df is mixture of classes and using @jogo's answer unfortunately messes up non-numeric (as she/he states). – val Jan 25 '17 at 08:15
  • 1
    @val You can work only on the numeric columns (use the indices of the columns). – jogo Jan 25 '17 at 08:17