2

I need to sum some columns in a data.frame with a rule that says, a column is to be summed to NA if more than one observation is missing NA if only 1 or less missing it is to be summed regardless.

Say I have some data like this,

dfn <- data.frame(
a  = c(3, 3, 0, 3),
b  = c(1, NA, 0, NA),
c  = c(0, 3, NA, 1))

dfn
  a  b  c
1 3  1  0
2 3 NA  3
3 0  0 NA
4 3 NA  1

and I apply my rule, and sum the columns with less then 2 missing NA. So I get something like this.

  a  b  c
1 3  1  0
2 3 NA  3
3 0  0 NA
4 3 NA  1
5 9 NA  4

I've played around with colSums(dfn, na.rm = FALSE) and colSums(dfn, na.rm = TRUE). In my real data there is more then three columns and also more then 4 rows. I imagine I can count the missing some way and use that as a rule?

Ricardo Oliveros-Ramos
  • 4,322
  • 2
  • 25
  • 42
Eric Fail
  • 8,191
  • 8
  • 72
  • 128

2 Answers2

5

I don't think you can do this with colSums alone, but you can add to its result using ifelse:

colSums(dfn,na.rm=TRUE) + ifelse(colSums(is.na(dfn)) > 1, NA, 0)
 a  b  c 
 9 NA  4 
flodel
  • 87,577
  • 21
  • 185
  • 223
James
  • 65,548
  • 14
  • 155
  • 193
  • Works like a charm, I wasn't aware of the open `+ ifelse`. Thanks a lot! – Eric Fail Jan 18 '13 at 18:21
  • 1
    @EricFail In this context `ifelse` produces another vector of the same size as the result from `colSums`. You are just adding 2 vectors together. – James Jan 18 '13 at 18:24
  • I see, I keep getting impressed by how freely the function in R can be combined. Thank you! – Eric Fail Jan 18 '13 at 18:29
1

Nothing wrong with @James' Answer, but here's a slightly cleaner way:

colSums(apply(dfn, 2, function(col) replace(col, match(NA, col), 0)))
# a  b  c 
# 9 NA  4 

match(NA, col) returns the index of the first NA in col, replace replaces it with 0 and returns the new column, and apply returns a matrix with all of the new columns.

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113