Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?

Question

I have searched through the forums and have not found exactly the answer to my question. I have a data set from the World Bank

library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
                     startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)

#Tibble:
# A tibble: 1,012 x 3
   iso3c  date  Gini
   <chr> <dbl> <dbl>
 1 ALB    2017  33.2
 2 ALB    2016  33.7
 3 ALB    2015  32.9
 4 ALB    2014  34.6
 5 ALB    2012  29  
 6 ALB    2008  30  
 7 ALB    2005  30.6
 8 DZA    2011  27.6
 9 AGO    2018  51.3
10 AGO    2008  42.7
# … with 1,002 more rows

Then I try to lag this estimate by one year

#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])

Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))

tibble(Lagged.Gini)

# A tibble: 1,032 x 4
   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
 2 AGO    2008  42.7       51.3
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
 7 ALB    2012  29         34.6
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA

Unfortunately, my problem is that when years are missing the lag does not recognize that year is missing and just puts the most recent year as the lag. Ex: country "ALB"'s Gini estimate is not lagged by one year in 2012 it lags to the next year which is 2008.

I would want the final data to look the same but how I edited below -- and ideally to be able to lag for multiple years:

# A tibble: 1,032 x 4

   iso3c  date  Gini Gini.lag.1
   <chr> <dbl> <dbl>      <dbl>
 1 AGO    2018  51.3       NA  
   AGO   2017   NA        51.3
 2 AGO    2008  42.7       NA
   AGO    2007  NA        42.7
 3 ALB    2017  33.2       NA  
 4 ALB    2016  33.7       33.2
 5 ALB    2015  32.9       33.7
 6 ALB    2014  34.6       32.9
   ALB    2013   NA         29
 7 ALB    2012  29         NA
 8 ALB    2008  30         29  
 9 ALB    2005  30.6       30  
10 ARE    2014  32.5       NA

r2evans · Accepted Answer · 2020-08-19T18:11:11.297

pseudospin's answer is great for base R. Since you're using tibbles, here's a tidyverse version with the same effect:

Gini <- readr::read_table("
iso3c  date  Gini
ALB    2017  33.2
ALB    2016  33.7
ALB    2015  32.9
ALB    2014  34.6
ALB    2012  29  
ALB    2008  30  
ALB    2005  30.6
DZA    2011  27.6
AGO    2018  51.3
AGO    2008  42.7")

library(dplyr)
Gini %>%
  transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
  full_join(Gini, ., by = c("iso3c", "date")) %>%
  arrange(iso3c, desc(date))
# # A tibble: 17 x 4
#    iso3c  date  Gini Gini.lag.1
#    <chr> <dbl> <dbl>      <dbl>
#  1 AGO    2018  51.3       NA  
#  2 AGO    2017  NA         51.3
#  3 AGO    2008  42.7       NA  
#  4 AGO    2007  NA         42.7
#  5 ALB    2017  33.2       NA  
#  6 ALB    2016  33.7       33.2
#  7 ALB    2015  32.9       33.7
#  8 ALB    2014  34.6       32.9
#  9 ALB    2013  NA         34.6
# 10 ALB    2012  29         NA  
# 11 ALB    2011  NA         29  
# 12 ALB    2008  30         NA  
# 13 ALB    2007  NA         30  
# 14 ALB    2005  30.6       NA  
# 15 ALB    2004  NA         30.6
# 16 DZA    2011  27.6       NA  
# 17 DZA    2010  NA         27.6

If you need to do this n times (one more lag each time), you can extend it programmatically this way:

Ginilags <- lapply(1:3, function(lg) {
  z <- transmute(Gini, iso3c, date = date - lg, Gini)
  names(z)[3] <- paste0("Gini.lag.", lg)
  z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
       c(list(Gini), Ginilags)) %>%
  arrange(iso3c, desc(date))
# # A tibble: 28 x 6
#    iso3c  date  Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
#    <chr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
#  1 AGO    2018  51.3       NA         NA         NA  
#  2 AGO    2017  NA         51.3       NA         NA  
#  3 AGO    2016  NA         NA         51.3       NA  
#  4 AGO    2015  NA         NA         NA         51.3
#  5 AGO    2008  42.7       NA         NA         NA  
#  6 AGO    2007  NA         42.7       NA         NA  
#  7 AGO    2006  NA         NA         42.7       NA  
#  8 AGO    2005  NA         NA         NA         42.7
#  9 ALB    2017  33.2       NA         NA         NA  
# 10 ALB    2016  33.7       33.2       NA         NA  
# # ... with 18 more rows

Wow @r2evans! Thank you for this help! I just ran and it worked exactly how I needed! I have a question for you - If I then wanted to add another variable for gini that lagged by two years would the only difference in the code be "date - 2" — Mary Claire Evans, Aug 19 '20 at 18:05
Probably, yes. If you're only going with lags 1 and 2, then that approach is likely the easiest and most straight-forward. If you think you'll be going to 3 or more, then it's a different problem *to do elegantly/efficiently* (though the premise will remain the same). — r2evans, Aug 19 '20 at 18:06
Thank you! I was going to say - simple but not so elegant would work for me-- so this is beyond helpful! :) — Mary Claire Evans, Aug 19 '20 at 18:13

pseudospin · Answer 2 · 2020-08-19T17:47:14.513

1

You could create a copy of the original table, but with the date having one year subtracted off. Then just join the two together on the iso3c and date columns to get the final result as you want it.

Like this

Gini_lagged <- data.frame(
  iso3c = Gini$iso3c, 
  date = Gini$date-1, 
  Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)

edited Aug 19 '20 at 17:47

answered Aug 19 '20 at 17:39

pseudospin

2,737
1
4
19

That worked! my code was: #Lag Gini Lagged.Gini <- Gini Lagged.Gini$date <- Lagged.Gini$date - 1 Lagged.Gini$Gini.lag1 <- Lagged.Gini$Gini – Mary Claire Evans Aug 19 '20 at 17:53
Thank you @pseudospin! – Mary Claire Evans Aug 19 '20 at 17:53
There was an issue with merging - I went from 157 countries to only 69... so there must have been something that was deleted when merging. – Mary Claire Evans Aug 19 '20 at 17:58
I just tried it and see 157 before and after the merge. If I do the merge without `all=TRUE` then I only get 69. Must be something to do with that. – pseudospin Aug 19 '20 at 18:02
Thank you - that was my issue! I didn't know about the all=True... This is very helpful to know! – Mary Claire Evans Aug 19 '20 at 18:09

score 1 · Answer 3 · answered Aug 19 '20 at 17:45

1

Using dplyr & tidyr from tidyverse, you can do a rowwise mutate to find years which match the year in the current row minus 1.

library(tidyverse)

Gini %>%
     rowwise() %>%
     mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
     unnest(c(Gini.lag.1), keep_empty = T)

answered Aug 19 '20 at 17:45

stlba

667
3
13

Does this really work for you? I get an error because `Gini$` is ambiguous (to `dplyr`) for the column name. You can fix the ambiguity by using `.env$Gini$Gini` and `.env$Gini$date` (respectively); for context on why, see https://stackoverflow.com/q/63399011/3358272. (Further, though, doing things `rowwise` when avoidable can get very slow.) – r2evans Aug 19 '20 at 17:52
1

Ah, I'd accidentally named the column "gini" instead when I created the table and just changed it in my answer so that might have stopped me getting the error – stlba Aug 19 '20 at 18:04

Is there an R function that can help lag a variable by one year by grouping in country when some years are missing?

3 Answers3