I have merged two dataframes, call them A and B. One has values for important variables for every year with some missing data which I will deal with separately. The second has values for only certain years (election years). This is cross national panel data, with a country-year unit of observation so it is important to do distinguish both country and year in any operation. After merging, the non-election years have NA values for the data from the second dataframe as expected. These NAs need to be filled in with the data from the previous election for that specific country, until the next election in that country. I do not want to fill in any NAs for data from dataframe A.
(For those who might have theoretical questions, the data from B is about the governing party, so it is theoretically sound to fill the data in this manner.)
If I subset the data by country, I can do this easily with the tidy::fill function by selecting only the columns containing data from B. For the full dataframe with all countries, I can't do this because in some cases it will fill in the beginning years for one country with values from the previous country in the dataframe.
Here is a minimal example of the data arrangement (bearing in mind there are actually 190 countries and 9282 observations in the real data):
country <- c("Austria","Austria","Austria","Austria","Austria",
"Belgium","Belgium","Belgium","Belgium","Belgium")
year <- c("1999","2000","2001","2002","2003",
"1999","2000","2001","2002","2003")
a1 <- c(5,4,NA,4,3,6,2,9,NA,7)
a2 <- c(45,53,57,51,33,37,12,48,55,41)
b1 <- c(NA,"A",NA,NA,NA,NA,NA,"B",NA,"C")
b2 <- c(NA,7,NA,NA,NA,NA,NA,5,NA,7)
df <- data.frame(country,year,a1,a2,b1,b2)
country | year | a1 | a2 | b1 | b2 |
---|---|---|---|---|---|
Austria | 1999 | 5 | 45 | NA | NA |
Austria | 2000 | 4 | 53 | A | 7 |
Austria | 2001 | NA | 57 | NA | NA |
Austria | 2002 | 4 | 51 | NA | NA |
Austria | 2003 | 3 | 33 | NA | NA |
Belgium | 1999 | 6 | 37 | NA | NA |
Belgium | 2000 | 2 | 12 | NA | NA |
Belgium | 2001 | 9 | 48 | B | 5 |
Belgium | 2002 | NA | 55 | NA | NA |
Belgium | 2003 | 7 | 41 | C | 7 |
Here is what I want to produce:
country | year | a1 | a2 | b1 | b2 |
---|---|---|---|---|---|
Austria | 1999 | 5 | 45 | NA | NA |
Austria | 2000 | 4 | 53 | A | 7 |
Austria | 2001 | NA | 57 | A | 7 |
Austria | 2002 | 4 | 51 | A | 7 |
Austria | 2003 | 3 | 33 | A | 7 |
Belgium | 1999 | 6 | 37 | NA | NA |
Belgium | 2000 | 2 | 12 | NA | NA |
Belgium | 2001 | 9 | 48 | B | 5 |
Belgium | 2002 | NA | 55 | B | 5 |
Belgium | 2003 | 7 | 41 | C | 7 |
Simply using tidy::fill will result, in the example, in incorrect values for Belgium for 1999 and 2000, as it will fill in the values from Austria.