-1

I m trying to use the solution here: Replacing NAs with latest non-NA value

to a large data table, but can't manage to work out.

My datatable childs is at the link here. I want to create a column pop1 that takes value from pop when value is missing. The tricky bit is to pass the right group, which I am unable to.

if I subset first:

child_sub<- childs[ sector_number=='3.B.2.1.1' & meastype== 'NRATE']

and then

childs1 <- child_sub[, pop1:=pop[1], .( cumsum(!is.na(value)) )]

it works, in the sense that it gives pop1 for the year 2019 the value of pop of 2018, that is 1591.251. however, if I try:

childs1 <- childs[, pop1:=pop[1], .( party, sector_number, meastype, cumsum(!is.na(value)) )], it doesn't assign the correct pop (i.e.the pop from the same meastype, previous year), and I don't understant why...

I also tried with:

childs1 <- childs[is.na(value), pop1:=setnafill(pop, type = "locf")]

which throws error straight away.

any help appreciated

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
efz
  • 425
  • 4
  • 9

1 Answers1

0

I post the solution in case someone will be reading this in the future. I realised that the solution proposed here Replacing NAs with latest non-NA value, and that I wanted to replicate, does't work when there are two consecutive NA values. In that case there is no pop[1] and returns zero. The solution is to sort before the line childs[, pop1:=pop[1], .( sector_number, meastype, cumsum(!is.na(value)) )]:

setkey(childs, sector_number, meastype)
childs[, pop1:=pop[1], .(sector_number, meastype, cumsum(!is.na(value)) )]

with this, it returns the desired outcome.

efz
  • 425
  • 4
  • 9