7

hopefully this is not a too dumb question, but still being an R beginner I have a serious problem with tapply. Lets say

factors <- as.factor( c("a", "b", "c", "a", "b", "c", "a", "b", "c") )
values  <- c( 1, 2, 3, 4, 5, NA, 7, NA, NA )
tapply(
  values,
  factors,
  function(x){
    if( sum(is.na(x)) == 1 ){
      x[ is.na(x) ] <- 0
    }
    return(x)
  }
)

The result is

$a
[1] 1 4 7

$b
[1] 2 5 0

$c
[1]  3 NA NA

However, what I need is to get a vector back which preserves the original order of values, i.e.:

c( 1,2,3,4,5,NA,7,0,NA )

Many thanks in advance.

Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • This was my first question on stackoverflow and I am very impressed about the fast help I got. So many thanks to all. – Beasterfield May 24 '11 at 00:50
  • 4
    It's because your question is clear, contain all relevant information and data to work with. – Marek May 24 '11 at 08:13

4 Answers4

7

In that case you should use the ave function:

> ave(values, factors, FUN=function(x) {
+     if( sum(is.na(x)) == 1 ){
+       x[ is.na(x) ] <- 0
+     }
+     return(x)
+   }
+ )
[1]  1  2  3  4  5 NA  7  0 NA
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

A simple for loop does this very simply:

fun <- function(x){
   if(sum(is.na(x)) == 1){x[is.na(x)] <- 0}
       return(x)
}

for (i in unique(factors)){
   values[i == factors] <- fun(values[i == factors])
}
joran
  • 169,992
  • 32
  • 429
  • 468
  • I also thought about that. But aren't all these apply-functions times faster than iterating over the data manually with loops? Especially since computational demand is an issue for my data. – Beasterfield May 23 '11 at 23:19
  • Not always, tapply and apply are just syntactic sugar. Look at the source for tapply() type `tapply` in the console. I guess the point is really that the number of iterations is going to be small (usually) compared to the length of the data in each. – mdsumner May 23 '11 at 23:23
  • I agree with mdsumner, although in this case I believe that DWin's answer using `ave()` is considerably faster than an explicit for loop. – joran May 23 '11 at 23:33
  • It might be interesting to test. Sometimes you get surprised. – IRTFM May 23 '11 at 23:42
  • 1
    @DWin - I did; `ave()` is almost 5 times faster than my `for` loop, (on a 300000 length vector, still with only 3 levels). – joran May 23 '11 at 23:48
  • @mdsumner : that's a bit too sharp there. They're more than syntactic sugar, and can give quite a speedup in some cases. See the answers here : http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – Joris Meys May 24 '11 at 14:16
  • The sugar is sweet, but tapply and apply are not faster - they are R for loops under the hood - by "not always" I meant "at least lapply can be faster" but I didn't want to commit, hence the reference to just these two functions. – mdsumner May 25 '11 at 00:42
0

An option is to use the replacement method for split():

## create a copy to store the result after replacement
res <- values

## use split's replacement method to split, apply, and recombine
split(res, factors) <- lapply(split(res, factors),
 function(x){
 if( sum(is.na(x)) == 1 ){
   x[ is.na(x) ] <- 0
 }
  return(x)
 }
)
mdsumner
  • 29,099
  • 6
  • 83
  • 91
0

In case others found this question by searching for how to disable alphabetic sorting for the groups, you can do this:

> v=1:4
> group=c("b","b","a","a")
> tapply(v,group,sum)
a b
7 3
> tapply(v,factor(group,unique(group)),sum)
b a
3 7
nisetama
  • 7,764
  • 1
  • 34
  • 21