8

Stata has a very nice command, egen, which makes it easy to compute statistics over group of observation. For instance, it is possible to compute the max, the mean and the min for each group and add them as a variable in the detailed data set. The Stata command is one line of code :

by group : egen max = max(x)

I've never found the same command in R. summarise in the dplyr package makes it easy to compute statistics for each group but then I have to run a loop to associate the statistic to each observation :

library("dplyr")
N  <- 1000
tf  <- data.frame(group = sample(1:100, size = N, replace = TRUE), x = rnorm(N))
table(tf$group)
mtf  <- summarise(group_by(tbl_df(tf), group), max = max(x))
tf$max  <- NA
for (i in 1:nrow(mtf)) {
  tf$max[tf$group == mtf$group[i]]  <- mtf$max[i]
}

Does any one has a better solution ?

Jaap
  • 81,064
  • 34
  • 182
  • 193
PAC
  • 5,178
  • 8
  • 38
  • 62
  • 1
    There is a number of alternatives. Your question shows a lack of research (you didn't even study the dplyr package vignette). -1 – Roland Jun 11 '14 at 11:23
  • 1
    I have no bias against `egen` (I wrote some of the functions) but even from a Stata viewpoint it is just a handy collection of stuff for creating variables. There's no central idea that maps onto anything that would be a central idea in R. Even the convenience of producing summary statistics by group is not in fact part of the definition or role of `egen` but just something possible with some of its components. I won't speak for R but I suspect some of its packages are also a bit miscellaneous. – Nick Cox Jun 11 '14 at 11:44
  • I agree with you but it still really useful. – PAC Jun 11 '14 at 12:35

1 Answers1

13

Here are a few approaches:

dplyr

library(dplyr)

tf %>% group_by(group) %>% mutate(max = max(x))

ave

This uses only the base of R:

transform(tf, max = ave(x, group, FUN = max))

data.table

library(data.table)

dt <- data.table(tf)
dt[, max:=max(x), by=group]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341