subset datatable to replace NAs with latest non-NA value

Question

I m trying to use the solution here: Replacing NAs with latest non-NA value

to a large data table, but can't manage to work out.

My datatable childs is at the link here. I want to create a column pop1 that takes value from pop when value is missing. The tricky bit is to pass the right group, which I am unable to.

if I subset first:

child_sub<- childs[ sector_number=='3.B.2.1.1' & meastype== 'NRATE']

and then

childs1 <- child_sub[, pop1:=pop[1], .( cumsum(!is.na(value)) )]

it works, in the sense that it gives pop1 for the year 2019 the value of pop of 2018, that is 1591.251. however, if I try:

childs1 <- childs[, pop1:=pop[1], .( party, sector_number, meastype, cumsum(!is.na(value)) )], it doesn't assign the correct pop (i.e.the pop from the same meastype, previous year), and I don't understant why...

I also tried with:

childs1 <- childs[is.na(value), pop1:=setnafill(pop, type = "locf")]

which throws error straight away.

any help appreciated

you should provide example code to reproduce, then I would provide code that answers your question — jangorecki, Mar 11 '21 at 20:04
Hi thanks. The data are with the link, where it says ' here' — efz, Mar 11 '21 at 20:31

score 0 · Answer 1 · answered Mar 16 '21 at 08:23

I post the solution in case someone will be reading this in the future. I realised that the solution proposed here Replacing NAs with latest non-NA value, and that I wanted to replicate, does't work when there are two consecutive NA values. In that case there is no pop[1] and returns zero. The solution is to sort before the line childs[, pop1:=pop[1], .( sector_number, meastype, cumsum(!is.na(value)) )]:

setkey(childs, sector_number, meastype)
childs[, pop1:=pop[1], .(sector_number, meastype, cumsum(!is.na(value)) )]

with this, it returns the desired outcome.

subset datatable to replace NAs with latest non-NA value

1 Answers1