1

I have the following un-balance dataset:

id Year A
1  1    5
1  2    6
2  1    11
2  2    12
2  3    13
3  2    1
3  3    3

I would like to great a variable lagA that truly takes into account the year and id of each observation and not justs shifts the column down:

id Year A   lagA
1  1    5   NA
1  2    6   5
2  1    11  NA
2  2    12  11
2  3    13  12
3  2    1   NA
3  3    3   1

Any ideas? I tried making sure that the dataframe is pf class pdata.frame but when I use the function lag(A,1) it merely shifts the column down which produces inconsistent results.

Economist_Ayahuasca
  • 1,648
  • 24
  • 33

1 Answers1

2

We need to group by 'id' and then do the lag

library(dplyr)
df1 %>%
     arrange(id, Year) %>% # in case not ordered by 'year'
     group_by(id) %>%
     mutate(lagA = lag(A))
# A tibble: 7 x 4
# Groups:   id [3]
#     id  Year     A  lagA
#  <int> <int> <int> <int>
#1     1     1     5    NA
#2     1     2     6     5
#3     2     1    11    NA
#4     2     2    12    11
#5     2     3    13    12
#6     3     2     1    NA
#7     3     3     3     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I tried to avoid loading an external package but the following with `stats::lag` doesn't work: `ave(df1$A, df1$id, FUN = dplyr::lag)`. – Rui Barradas Jul 09 '18 at 08:40
  • 1
    @RuiBarradas You can append with `NA` and removing the last observation in `ave` `with(df1, ave(A, id, FUN = function(x) c(NA, x[-length(x)])))` – akrun Jul 09 '18 at 08:41
  • 1
    Yes, but the above is simpler. – Rui Barradas Jul 09 '18 at 08:42
  • 1
    @RuiBarradas It is not clear why the output of stats::lag is `stats::lag(1:5)# [1] 1 2 3 4 5` is the same. Probably it needs a time series object – akrun Jul 09 '18 at 08:43
  • 1
    The only difference is the `tsp` attribute. Try, for instance, `stats::lag(1:5, k = 2)`. – Rui Barradas Jul 09 '18 at 08:51
  • I was trying in a ts object `stats::lag(ts(1:5, start = c(1971,1)))` – akrun Jul 09 '18 at 08:52
  • works great but it seems that the results are not saved in the data frame but in a tibbles format. Any idea on how to incorporate the new variable into the old dataset? – Economist_Ayahuasca Jul 09 '18 at 09:19
  • 1
    @AndresAzqueta You can convert it to data.frame by piping at the end `%>% as.data.frame` – akrun Jul 09 '18 at 09:23
  • last question, the new variable created does not seem to get added to df1... any ideas of why? – Economist_Ayahuasca Jul 09 '18 at 09:39
  • 1
    @AndresAzqueta You need to assign it to the object ie. `df1 <- df1 %>% arrange %>%` or use the `%<>%` operator from `magrittr` for in place assignment – akrun Jul 09 '18 at 09:41