-1

I'd like to build up on a question I asked earlier (Adding row for missing value in data.table). Actualy I don't need to add so many zero, I only need the first missing one to be 0.

So this time I would like to insert a 0 only if the last entry for the same ('ida', 'idb') is not already 0. We suppose there is no 0 in the original data.table.

So an example would be :

 ida       idb         value     date
  A         3          26600  2004-12-31
  A         3          19600  2005-03-31
  A         3          18200  2005-06-30
  A         4           9560  2004-12-31
  B         1           2600  2004-12-31
  B         1            100  2005-03-31
  B         1           8200  2005-06-30
  B         1           9560  2007-12-31
  B         9           1423  2004-12-31
  B         9           1235  2005-06-30
  C         2           8700  2005-12-31

Gives :

 ida       idb         value     date
  A         3          26600  2004-12-31
  A         3          19600  2005-03-31
  A         3          18200  2005-06-30
  A         4           9560  2004-12-31
  A         4              0  2005-03-31
  B         1           2600  2004-12-31
  B         1            100  2005-03-31
  B         1           8200  2005-06-30
  B         1           9560  2007-12-31
  B         9           1423  2004-12-31
  B         0              0  2005-03-31
  B         9           1235  2005-06-30
  B         0              0  2007-12-31
  C         2           8700  2005-12-31
Community
  • 1
  • 1
Kevin P
  • 273
  • 1
  • 3
  • 13
  • It's not clear how is this question differs from the previous one. – David Arenburg Mar 25 '15 at 22:49
  • In the previous one there would be more 0 – Kevin P Mar 25 '15 at 22:56
  • So what's the rule here? Why `B` would have additional two `0` while `A` will have only one additional? – David Arenburg Mar 25 '15 at 22:57
  • There is only one 0 for A because if we added a second one (at the date 2005-06-30) there would be two consecutive 0. B has two additional 0 because the first one is separated by 2005-06-30 from the second one. The rule is : insert a 0 only if no 0 was inserted right before. – Kevin P Mar 25 '15 at 23:01

1 Answers1

1

I found a way to do it. So we take the answer to my previous question :

setkey(dt, idb, date)
dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0]

And we add some more steps :

setkey(dt, idb, date)
dt[, .SD[CJ(unique(idb), unique(date))][is.na(value), value := 0][prev.value := c(0, value[-.N])][value > 0 | (value == 0 & prev.value > 0), ][, prev.value := NULL], by = ida]

So for each subset we replace NAs by 0s, and then add a new column corresponding to the previous values. Finally we keep only the rows != 0 and the first 0, and we delete the prev.value column.

Kevin P
  • 273
  • 1
  • 3
  • 13