How to use tapply and preserve order of values

Question

hopefully this is not a too dumb question, but still being an R beginner I have a serious problem with tapply. Lets say

factors <- as.factor( c("a", "b", "c", "a", "b", "c", "a", "b", "c") )
values  <- c( 1, 2, 3, 4, 5, NA, 7, NA, NA )
tapply(
  values,
  factors,
  function(x){
    if( sum(is.na(x)) == 1 ){
      x[ is.na(x) ] <- 0
    }
    return(x)
  }
)

The result is

$a
[1] 1 4 7

$b
[1] 2 5 0

$c
[1]  3 NA NA

However, what I need is to get a vector back which preserves the original order of values, i.e.:

c( 1,2,3,4,5,NA,7,0,NA )

Many thanks in advance.

This was my first question on stackoverflow and I am very impressed about the fast help I got. So many thanks to all. — Beasterfield, May 24 '11 at 00:50
It's because your question is clear, contain all relevant information and data to work with. — Marek, May 24 '11 at 08:13

score 7 · Accepted Answer · answered May 23 '11 at 23:05

7

In that case you should use the ave function:

> ave(values, factors, FUN=function(x) {
+     if( sum(is.na(x)) == 1 ){
+       x[ is.na(x) ] <- 0
+     }
+     return(x)
+   }
+ )
[1]  1  2  3  4  5 NA  7  0 NA

answered May 23 '11 at 23:05

IRTFM

258,963
21
364
487

1

Yeah. The ave function is pretty cool. You just need to remember to explicitly use... FUN= – IRTFM May 23 '11 at 23:41
True, took me a couple of minutes to figure out. But still you really made me happy with your answer. – Beasterfield May 24 '11 at 00:50

score 1 · Answer 2 · answered May 23 '11 at 23:05

1

A simple for loop does this very simply:

fun <- function(x){
   if(sum(is.na(x)) == 1){x[is.na(x)] <- 0}
       return(x)
}

for (i in unique(factors)){
   values[i == factors] <- fun(values[i == factors])
}

answered May 23 '11 at 23:05

joran

169,992
32
429
468

I also thought about that. But aren't all these apply-functions times faster than iterating over the data manually with loops? Especially since computational demand is an issue for my data. – Beasterfield May 23 '11 at 23:19
Not always, tapply and apply are just syntactic sugar. Look at the source for tapply() type `tapply` in the console. I guess the point is really that the number of iterations is going to be small (usually) compared to the length of the data in each. – mdsumner May 23 '11 at 23:23
I agree with mdsumner, although in this case I believe that DWin's answer using `ave()` is considerably faster than an explicit for loop. – joran May 23 '11 at 23:33
It might be interesting to test. Sometimes you get surprised. – IRTFM May 23 '11 at 23:42
1

@DWin - I did; `ave()` is almost 5 times faster than my `for` loop, (on a 300000 length vector, still with only 3 levels). – joran May 23 '11 at 23:48
@mdsumner : that's a bit too sharp there. They're more than syntactic sugar, and can give quite a speedup in some cases. See the answers here : http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – Joris Meys May 24 '11 at 14:16
The sugar is sweet, but tapply and apply are not faster - they are R for loops under the hood - by "not always" I meant "at least lapply can be faster" but I didn't want to commit, hence the reference to just these two functions. – mdsumner May 25 '11 at 00:42

score 0 · Answer 3 · answered May 23 '11 at 23:11

An option is to use the replacement method for split():

## create a copy to store the result after replacement
res <- values

## use split's replacement method to split, apply, and recombine
split(res, factors) <- lapply(split(res, factors),
 function(x){
 if( sum(is.na(x)) == 1 ){
   x[ is.na(x) ] <- 0
 }
  return(x)
 }
)

score 0 · Answer 4 · answered Aug 06 '23 at 05:07

0

In case others found this question by searching for how to disable alphabetic sorting for the groups, you can do this:

> v=1:4
> group=c("b","b","a","a")
> tapply(v,group,sum)
a b
7 3
> tapply(v,factor(group,unique(group)),sum)
b a
3 7

answered Aug 06 '23 at 05:07

nisetama

7,764
1
34
21

How to use tapply and preserve order of values

4 Answers4