2

Suppose I have a data frame with three variables as below. How do I create a new variable that for each group takes the first observation of x?

group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2    

I want to to create something like this:

group  year   x    y 
1      2000   3    3
1      2001   4    3
2      2000   1    1
2      2001   3    1
3      2000   5    5 
3      2001   2    5
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    Just for your reference. I would recommend this quick read.https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jacobsg Oct 12 '17 at 19:49

4 Answers4

3

Set up data for example:

dd <- data.frame(group=rep(1:3,each=2),
                year=rep(2000:2001,3),
                x=c(3,4,1,3,5,2))

In base R, use ave(). By default this finds the group average (rather than the first value), but we can use the FUN argument to ask it to select the first value instead.

dd$y <- ave(dd$x, dd$group, FUN=function(x) x[1])
## or
dd <- transform(dd,y=ave(x, group, FUN=function(x) x[1])

(alternatively could use FUN=function(x) head(x,1))

In tidyverse,

library(dplyr)
dd <- dd %>%
        group_by(group) %>%
        mutate(y=first(x))

@lmo points out another alternative in comments:

library(data.table)
setDT(dd)[, y := first(x), by=group]

You can find nearly endless discussion of the relative merits of these three major approaches (base R vs tidyverse vs data.table) elsewhere (on StackOverflow and on the interwebs generally).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Could throw in `data.table` for good measure: `setDT(dd)[, y := first(x), by=group]`. – lmo Oct 12 '17 at 20:04
1

Using package plyr:

df <- data.frame(group=c(1,1,2,2,3,3), 
                 year=c(2000,2001,2000,2001,2000,2001), 
                 x=c(3,4,1,3,5,2))

library(plyr)
ddply(df, .(group), transform, y=x[1])
user3640617
  • 1,546
  • 13
  • 21
0

A simple version in base R

### Your data
df = read.table(text="group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2",
header=TRUE)

df$y = aggregate(as.numeric(row.names(df)), list(df$group), min)$x[df$group]
df
  group year x y
1     1 2000 3 1
2     1 2001 4 1
3     2 2000 1 3
4     2 2001 3 3
5     3 2000 5 5
6     3 2001 2 5
G5W
  • 36,531
  • 10
  • 47
  • 80
0

Here's yet another way, using base R:

dd <- data.frame(group = rep(1:3, each = 2),
                 year = rep(2000:2001, 3),
                 x = c(3, 4, 1, 3, 5, 2))

transform(dd, y = unsplit(tapply(x, group, function(x) x[1]), group))
Patrick Perry
  • 1,422
  • 8
  • 17