7

Could someone please point to how we can apply multiple functions to the same column using tapply (or any other method, plyr, etc) so that the result can be obtained in distinct columns). For eg., if I have a dataframe with

User  MoneySpent
Joe       20
Ron       10
Joe       30
...

I want to get the result as sum of MoneySpent + number of Occurences.

I used a function like --

f <- function(x) c(sum(x), length(x))
tapply(df$MoneySpent, df$Uer, f)

But this does not split it into columns, gives something like say,

Joe    Joe    100, 5   # The sum=100, number of occurrences = 5, but it gets juxtaposed

Thanks in advance,

Raj

Jaap
  • 81,064
  • 34
  • 182
  • 193
xbsd
  • 2,438
  • 4
  • 25
  • 35

2 Answers2

9

You can certainly do stuff like this using ddply from the plyr package:

dat <- data.frame(x = rep(letters[1:3],3),y = 1:9)

ddply(dat,.(x),summarise,total = NROW(piece), count = sum(y))
  x total count
1 a     3    12
2 b     3    15
3 c     3    18

You can keep listing more summary functions, beyond just two, if you like. Note I'm being a little tricky here in calling NROW on an internal variable in ddply called piece. You could have just done something like length(y) instead. (And probably should; referencing the internal variable piece isn't guaranteed to work in future versions, I think. Do as I say, not as I do and just use length().)

joran
  • 169,992
  • 32
  • 429
  • 468
  • Do you know of any resource where I can read more about plyr. The pdf on CRAN has very limited information on usage. – xbsd Sep 13 '11 at 18:32
  • @xbsd - arguably one of the most comprehensive sources of info will be on SO, either under the `plyr` tag or simply searching for `plyr` with the R tag: http://stackoverflow.com/search?q=[r]+plyr – Chase Sep 13 '11 at 19:29
  • @xsbd The [tag:plyr] website is here: http://plyr.had.co.nz/, which will link to this article you should read: http://www.jstatsoft.org/v40/i01/paper – Andrie Sep 13 '11 at 19:32
7

ddply() is conceptually the clearest, but sometimes it is useful to use tapply instead for speed reasons, in which case the following works:

do.call( rbind, tapply(df$MoneySpent, df$User, f) )
joran
  • 169,992
  • 32
  • 429
  • 468
petrelharp
  • 4,829
  • 1
  • 16
  • 17