Creating a new variable in R

Question

The data set contains four columns (id, x1, x2, and y1). Notice that there are some multiple records (by id).

Here is the data:

id <- c(1,  1,  1,  1,  2,  3,  3,  3,  4,  4,  4,  4,  5,  5, 5, 6)    
x1 <- c("a","b","c","a","a","a","c","c","b", "e", "w", "r", "b", "c", "w", "r")
x2 <- c(0.12,   0.76,   0.08,   0.11,   0.80,   0.24,   0.19,   0.07,   0.70,   0.64,   0.97,   0.04,   0.40,   0.67,   0.25, 0.01)
y1 <- c(1132,   1464,   454,    1479,   167,    335,    280,    391,    973,    1343,   777,    1333,   293,    694,    76, 114)
mdat <- data.frame(id, x1, x2, y1)

I want to create a new column (let's call it y2). ys is defined as

y2(i) = y1(i-1) for the same id. Not that for data with onlu one id, then y2=NA.

Here is the output:

id  x1  x2       y1     y2
1   a   0.12    1132    
1   b   0.76    1464    1132
1   c   0.08    454     1464
1   a   0.11    1479    454
2   a   0.8     167 
3   a   0.24    335 
3   c   0.19    280     335
3   c   0.07    391     280
4   b   0.7     973 
4   e   0.64    1343    973
4   w   0.97    777     1343
4   r   0.04    1333    777
5   b   0.4     293 
5   c   0.67    694     293
5   w   0.25    76      694
6   r   0.01    114

Before you flag the question to close it... let me know what's wrong. Thanks! — user9292, Aug 30 '16 at 20:04
`mdat$y2 <- ave(mdat$y1, mdat$id, FUN = function(x){c(NA, x[-length(x)])})` or `ave(mdat$y1, mdat$id, FUN = dplyr::lag)` or `ave(mdat$y1, mdat$id, FUN = data.table::shift)` or translate to your favorite grammar. — alistaire, Aug 30 '16 at 20:23

Jilber Urbina · Accepted Answer · 2016-08-30T20:29:31.927

Here's an alternative you may want to consider using lag function from dplyr package

> mdat$y2 <- unlist(tapply(mdat$y1, mdat$id, lag, 1))
> mdat
   id x1   x2   y1   y2
1   1  a 0.12 1132   NA
2   1  b 0.76 1464 1132
3   1  c 0.08  454 1464
4   1  a 0.11 1479  454
5   2  a 0.80  167   NA
6   3  a 0.24  335   NA
7   3  c 0.19  280  335
8   3  c 0.07  391  280
9   4  b 0.70  973   NA
10  4  e 0.64 1343  973
11  4  w 0.97  777 1343
12  4  r 0.04 1333  777
13  5  b 0.40  293   NA
14  5  c 0.67  694  293
15  5  w 0.25   76  694
16  6  r 0.01  114   NA

score 1 · Answer 2 · answered Aug 30 '16 at 20:05

1

A solution with dplyr

library(dplyr)

mdat %>%
  group_by(id) %>%
  mutate(y2 = y1 - c(NA,y1[-length(y1)]))

answered Aug 30 '16 at 20:05

Mark Peterson

9,370
2
25
48

1

dplyr has a `lag` function.... – David Arenburg Aug 30 '16 at 20:25
You also need to chop out the `y1 - ` or the results won't match the desired. – alistaire Aug 30 '16 at 20:38

Creating a new variable in R

2 Answers2