How to calculate the difference between two columns when it contains numerical and NA values?

Question

I have a dataframe called a

It has 23 columns and around 400 rows. The last two concentration columns are C1 and C2

They contain numerical values as well as NA values (rows with both columns having NA have been removed)

I need to calculate the absolute difference between C1 and C2 (i.e. no negative nor positive net)

I normally use this code to calculate differences but I do not know how to customise it to overlook NA in one of the column and to yield absolute values

a$diff <- (a$C1 - a$C2)

Replacing NA w/another value even zero is not an option. I need to see the trend in the difference between C1 and C2 where it is applicable. Also, concentration values may themselves be zero.

I was thinking some kind of conditional statement !is.na but I do not know where to insert it or how. As in do a comparison of both C1 and C2 that neither hold NA and then do a calculation.

Do you want to not keep NA values, set them to 0, or return the non NA value if an NA exists? These are very different things. Also `abs()` returns absolute values. Perhaps this helps: https://stackoverflow.com/questions/45311490/is-it-possible-to-skip-na-values-in-operator — dandrews, Mar 28 '23 at 23:52
Replacing NA w/another value even zero is not an option. I need to see the trend in the difference between C1 and C2 where it is applicable. Also, concentration values may themselves be zero. I was thinking some kind of conditional statement !is.na but I do not know where to insert it or how. — creusac, Mar 29 '23 at 00:20

score 0 · Answer 1 · answered Mar 29 '23 at 00:48

This became too long for a comment:

It sounds to me that if you don't have data for C1 or C2 you wish to omit it. If this is the case you can do the calculation as you have it just add the call to abs(), then do a <- na.omit(a). This will only keep rows without any NA's present. Which I see you have 23 columns, if you are only interested in these two you should subset your data so that an NA in another column doesn't cause issues for you.

a <- data.frame(C1=c(1,2,NA,3,4),
                C2=c(NA,2,3,4,5),
                other1=rep(NA,5),
                other2=rep(1,5))
a

a$diff <- abs(a$C1-a$C2)
a

na.omit(a) # NA's in other columns cause problems!

a <- a[c('C1','C2','diff')]
# subset(a, select = c('C1','C2','diff')) # using subset if you prefer
a

na.omit(a)

How to calculate the difference between two columns when it contains numerical and NA values?

1 Answers1