I have the following data:
library(data.table)
df <- fread(
"A B C D E F iso year
0 A NA 1 NA NA NLD 2009
1 Y NA 2 NA NA NLD 2010
0 Q NA 3 NA NA AUS 2011
1 NA NA 4 NA NA AUS 2012
0 0 NA 7 NA NA NLD 2011
0 NA 1 NA 1 NA ECU 2009
1 NA 0 NA 2 0 ECU 2010
1 NA 0 NA 2 0 ECU 2011
0 NA 0 NA 3 2 BRA 2011
1 NA 0 NA 4 NA BRA 2012",
header = TRUE
)
I want to extrapolate for each country, the entry of the last year in the dataset,up to a particular year. For example, I want to use the last year for every country in df
(NLD=2011, AUS=2012, BRA=2012, ECU=2011), to create entries up to 2014. In other words, I would like to make duplicates of those entries, but with the years changed, of the last available year up to 2014.
Desired output:
library(data.table)
df <- fread(
"A B C D E F iso year
0 A NA 1 NA NA NLD 2009
1 Y NA 2 NA NA NLD 2010
0 Q NA 3 NA NA AUS 2011
1 NA NA 4 NA NA AUS 2012
1 NA NA 4 NA NA AUS 2013
1 NA NA 4 NA NA AUS 2014
0 0 NA 7 NA NA NLD 2011
0 0 NA 7 NA NA NLD 2012
0 0 NA 7 NA NA NLD 2013
0 0 NA 7 NA NA NLD 2014
0 NA 1 NA 1 NA ECU 2009
1 NA 0 NA 2 0 ECU 2010
0 NA 0 NA 3 0 BRA 2011
1 NA 0 NA 4 0 BRA 2011
1 NA 0 NA 2 0 ECU 2011
1 NA 0 NA 2 0 ECU 2012
1 NA 0 NA 2 0 ECU 2013
1 NA 0 NA 2 0 ECU 2014
0 NA 0 NA 3 2 BRA 2011
1 NA 0 NA 4 NA BRA 2012
1 NA 0 NA 4 NA BRA 2013
1 NA 0 NA 4 NA BRA 2014",
header = TRUE
)
But I do not really know where to start. What would be the easiest way to do this?
EDIT: When testing on the actual data, I noticed I overlooked one aspect of the dataset when phrasing the question. In the acutal dataset there is one more unique aspects (column A) in the row apart from the year.
library(data.table)
df <- fread(
"A B C D E F iso year
1 A NA 1 NA NA NLD 2009
1 Y NA 2 NA NA NLD 2010
1 Q NA 3 NA NA AUS 2011
1 NA NA 4 NA NA AUS 2012
1 0 NA 7 NA NA NLD 2011
1 NA 1 NA 1 NA ECU 2009
1 NA 0 NA 2 0 ECU 2010
1 NA 0 NA 2 0 ECU 2011
1 NA 0 NA 3 2 BRA 2011
1 NA 0 NA 4 NA BRA 2012
0 A NA 1 NA NA NLD 2009
0 Y NA 2 NA NA NLD 2010
0 Q NA 3 NA NA AUS 2011
0 NA NA 4 NA NA AUS 2012
0 0 NA 7 NA NA NLD 2011
0 NA 1 NA 1 NA ECU 2009
0 NA 0 NA 2 0 ECU 2010
0 NA 0 NA 2 0 ECU 2011
0 NA 0 NA 3 2 BRA 2011
0 NA 0 NA 4 NA BRA 2012",
header = TRUE
)
I have tried to adapt Ronak's result in this respect:
df <- df %>%
group_by(iso, A) %>%
slice(c(1:n(), rep(n(), 2014 - last(year)))) %>%
mutate(year = first(year):2014)
###
df <- df %>%
#If not arranged by year
arrange(iso, year, A)
group_by(iso) %>%
complete(year = seq(min(year), 2014)) %>%
fill(B:F)
The top code works, but on my actual data gives the error:
Error: Column `year` must be length 18 (the group size) or one, not 21
The bottom option gives the following error when applied to the example:
Error in group_by(iso) : object 'iso' not found