0

I have a data frame with the columns ID and year (and value as well, but this is not of relevance for this question:

id    year
1     2006
1     2007
1     2008
2     2007
2     2008
2     2009
2     2010

I'd like to add a new column called minyear, which is the minimal year for each id, displayed on each row:

id    year    minyear
1     2006    2006
1     2007    2006
1     2008    2006
2     2007    2007
2     2008    2007
2     2009    2007
2     2010    2007

In SQL, I'd do something like SELECT ID, year, min(year) AS minyear FROM df GROUP BY id. Is there an R-y equivalent which does this in an efficient way?

Roland
  • 517
  • 8
  • 25
  • 1
    See `?ave` - it can compute a summary stat (by default the `ave`rage), and assign it back to every row in a group. – thelatemail Jul 07 '16 at 00:04
  • The dplyr is kind of like the SQL: `library(dplyr) ; df %>% group_by(id) %>% mutate(minyear = min(year))` – alistaire Jul 07 '16 at 00:06
  • Same in`data.table`: `require(data.table); setDT(dat); dat[, .(year, minyear = min(year)), by = id]`. – m-dz Jul 07 '16 at 00:13

0 Answers0