How to do multiple operations, ignoring NAs, in R

Question

Is there a quick way to run multiple arithmetic operations across data frame variables while ignoring cases with NAs? I've put a simple example below.

It seems I could add intermediary variables or 'if' statements but that seems too convoluted.

d1<-c(2,2,2,2)
d2<-c(1,1,1,1)
d3<-c(1,1,NA,NA)

df<-data.frame(d1,d2,d3)
df
  d1 d2 d3
1  2  1  1
2  2  1  1
3  2  1 NA
4  2  1 NA

df$d4<-d1*((d2) + (d3))
df
  d1 d2 d3 d4
1  2  1  1  4
2  2  1  1  4
3  2  1 NA NA
4  2  1 NA NA

What I'd like to get is this:

df2<-data.frame(d1,d2,d3,d4=c(4,4,2,2))
    df2
      d1 d2 d3 d4
    1  2  1  1  4
    2  2  1  1  4
    3  2  1 NA  2
    4  2  1 NA  2

I could replace all values with 0s yet that could also be misleading.

EDIT:

I've tried converting NAs to 0s but it does not work and I don't understand why.

df<-data.frame(d1,d2,d3)
df
df[is.na(df)] <- 0
df
  d1 d2 d3
1  2  1  1
2  2  1  1
3  2  1  0
4  2  1  0
df$d4<-d1*((d2) + (d3))
df
  d1 d2 d3 d4
1  2  1  1  4
2  2  1  1  4
3  2  1  0 NA
4  2  1  0 NA

This is not the logic of NA. It seems you want NA treated as 0, so change the NA-values to 0. (eventually in a copy of your dataframe) — jogo, Jan 25 '17 at 07:46
@DavidArenburg: i thought of this but it is not general enough if my other variables (the ones in rowSums) are getting multiplied by additional factors or constants. — val, Jan 25 '17 at 07:46
what do you mean by "ignoring cases with NAs"? do you want the results for those row to be NA as well? — chinsoon12, Jan 25 '17 at 07:55
@jogo: I've tried your suggestion but something is amiss.... — val, Jan 25 '17 at 08:00
Regarding your last edit: there is a difference between `d1` and `df$d1`. You modified the latter, while the former stayed unchanged. — nicola, Jan 25 '17 at 08:00
"I could replace all values with 0s yet that could also be misleading." Yet, that's what you are doing for the calculation of `d4`. If you don't want to mislead, you should accept the `NA` values in `d4`. — Roland, Jan 25 '17 at 08:01
if someone puts an answer i'll check it. this seems so simple now I wonder if I should take it down. thanks all. — val, Jan 25 '17 at 08:04

jogo · Accepted Answer · 2018-11-19T08:20:02.513

2

If you want to change all NAs to 0 you can do:

df<-data.frame(d1=c(2,2,2,2), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
df.new <- as.data.frame(lapply(df, function(x) ifelse(is.na(x), 0, x)))

or (thanks to Sotos!):

df[is.na(df)] <- 0

But be careful: this will work well for dataframes with all columns numeric. In other cases you might face problems. Here is a solution for the case of nonnumeric columns:

df <- data.frame(d1=c(2,2,2,2), dx=c("A", "bb", "C", "DD"), d2=c(1,1,1,1), d3=c(1,1,NA,NA))
numCols <- sapply(df, is.numeric)

df[, numCols][is.na(df[, numCols])] <- 0
df

edited Nov 19 '18 at 08:20

answered Jan 25 '17 at 08:04

jogo

12,469
11
37
42

3

why not simply `df[is.na(df)] <- 0`? – Sotos Jan 25 '17 at 08:07
yes - simplification works best as my df is mixture of classes and using @jogo's answer unfortunately messes up non-numeric (as she/he states). – val Jan 25 '17 at 08:15
1

@val You can work only on the numeric columns (use the indices of the columns). – jogo Jan 25 '17 at 08:17

How to do multiple operations, ignoring NAs, in R

1 Answers1