How to create a new variable that takes the first observation of a different variable?

Question

Suppose I have a data frame with three variables as below. How do I create a new variable that for each group takes the first observation of x?

group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2

I want to to create something like this:

group  year   x    y 
1      2000   3    3
1      2001   4    3
2      2000   1    1
2      2001   3    1
3      2000   5    5 
3      2001   2    5

Just for your reference. I would recommend this quick read.https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — jacobsg, Oct 12 '17 at 19:49

Ben Bolker · Answer 1 · 2017-10-12T21:47:44.727

Set up data for example:

dd <- data.frame(group=rep(1:3,each=2),
                year=rep(2000:2001,3),
                x=c(3,4,1,3,5,2))

In base R, use ave(). By default this finds the group average (rather than the first value), but we can use the FUN argument to ask it to select the first value instead.

dd$y <- ave(dd$x, dd$group, FUN=function(x) x[1])
## or
dd <- transform(dd,y=ave(x, group, FUN=function(x) x[1])

(alternatively could use FUN=function(x) head(x,1))

In tidyverse,

library(dplyr)
dd <- dd %>%
        group_by(group) %>%
        mutate(y=first(x))

@lmo points out another alternative in comments:

library(data.table)
setDT(dd)[, y := first(x), by=group]

You can find nearly endless discussion of the relative merits of these three major approaches (base R vs tidyverse vs data.table) elsewhere (on StackOverflow and on the interwebs generally).

Could throw in `data.table` for good measure: `setDT(dd)[, y := first(x), by=group]`. — lmo, Oct 12 '17 at 20:04

score 1 · Answer 2 · answered Oct 12 '17 at 19:52

1

Using package plyr:

df <- data.frame(group=c(1,1,2,2,3,3), 
                 year=c(2000,2001,2000,2001,2000,2001), 
                 x=c(3,4,1,3,5,2))

library(plyr)
ddply(df, .(group), transform, y=x[1])

answered Oct 12 '17 at 19:52

user3640617

1,546
13
21

score 0 · Answer 3 · answered Oct 12 '17 at 19:56

A simple version in base R

### Your data
df = read.table(text="group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2",
header=TRUE)

df$y = aggregate(as.numeric(row.names(df)), list(df$group), min)$x[df$group]
df
  group year x y
1     1 2000 3 1
2     1 2001 4 1
3     2 2000 1 3
4     2 2001 3 3
5     3 2000 5 5
6     3 2001 2 5

score 0 · Answer 4 · answered Oct 12 '17 at 20:03

0

Here's yet another way, using base R:

dd <- data.frame(group = rep(1:3, each = 2),
                 year = rep(2000:2001, 3),
                 x = c(3, 4, 1, 3, 5, 2))

transform(dd, y = unsplit(tapply(x, group, function(x) x[1]), group))

answered Oct 12 '17 at 20:03

Patrick Perry

1,422
8
17

How to create a new variable that takes the first observation of a different variable?

4 Answers4