0

I have an unbalanced panel with the following structure:

   Cntry  Year  Gini 
   AU     1980  NA
   AU     1981  NA
   AU     ...   NA
   AU     1985  0.409
   AU     1986  0.406
   AU     1989  0.41
   AU     ....
   AU     2001  0.45
   AU     2002  NA
   AU     2003  NA 

The other countries show similar pattern. As the Gini will be part of my dependent variable definition, what I would like to do is to interpolate the NAs so I can have Gini information for years in which I have observations on the controls.

What I tried first was to use the zoo package and the na.spline function to interpolate:

range_completed$gini_priY=na.spline(range_completed$gini_priY)

However, in this way it replace all the values in the Gini variable (for example the 0.409 in year 1985).

How can I solve this? Thank you!

2 Answers2

0

You can introduce intermediate variable for you estimate like this:

library(data.table)
setDT(range_completed)
range_completed[, gini_priY_estimate := na.spline(gini_priY)]
range_completed[is.na(gini_priY), gini_priY :=gini_priY_estimate]
Bulat
  • 6,869
  • 1
  • 29
  • 52
  • Thank you @Bulat this works almost perfectly. Indeed, in the first command when I estimate the spline I have values that are apparently meaningless for a Gini measurement and also negative values. Then, once substituting in the original variable, all the values make sense, but I still have some NA due to the values out of range in the variable generated in the command range_completed[, gini_priY_estimate := na.spline(gini_priY)] Any idea? Thanks – Luca Giangregorio Aug 19 '19 at 07:55
  • can you update your question with reproducible example - https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example I can check\ – Bulat Aug 19 '19 at 08:14
  • 1
    My bad @Bulat, I didn't realize Gini was separated with comma in the original csv and therefore was considered as a factor variable. Now, as numeric, it works perfectly. Thank you! – Luca Giangregorio Aug 19 '19 at 08:47
0

I think what you need is the is_na function

range_completed[is.na(gini_priY), gini_priY :=gini_priY_estimate]

to identify the NA cases or use the built in function na_interpolation

wp78de
  • 18,207
  • 7
  • 43
  • 71