4

My question is somehow related to Fastest way to add rows for missing values in a data.frame? but a bit tougher I think. And I can't figure out how to adapt this solution to my problem.

Here is what my data.table looks like :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   2:               A         3          19600  2005-03-31
   3:               B         3          18200  2005-06-30
   4:               B         4          1230   2005-09-30
   5:               C         2          8700   2005-12-31

The difference is that every 'ida' has his own dates and there is at least one row where 'ida' appears with each date but not necessarily for all 'idb'. I want to insert every missing ('ida','idb') couple missing with the corresponding date and 0 as a value.

Moreover, there is no periodicity for the dates.

How would you do this ?

Desired output :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   1:               A         2            0    2005-03-31
   2:               A         3          19600  2005-03-31
   2:               A         3            0    2004-12-31
   3:               B         3          18200  2005-06-30
   4:               B         3            0    2005-09-30
   5:               B         4          1230   2005-09-30
   4:               B         4            0    2005-06-30
   6:               C         2          8700   2005-12-31

The order doesn't matter. Every date missing is filled with a 0 value.

Community
  • 1
  • 1
Kevin P
  • 273
  • 1
  • 3
  • 13
  • So what's the desired output for this sample input? – MrFlick Mar 25 '15 at 17:34
  • does your example include one missing couple? – rmuc8 Mar 25 '15 at 17:35
  • use `seq` with `range(date)`, then turn the Date vector into one column `data.table`, setkey on both DTs on `date` field, join both, fill NA with 0. If you manage to solve your problem don't forget to put the code in answer and mark as accepted. – jangorecki Mar 25 '15 at 17:44
  • I added the desired output, sorry about that. @JanGorecki : I can't use seq since there is no periodicity with the date. – Kevin P Mar 25 '15 at 17:50

1 Answers1

6

You just do the same thing as in your linked question by each ida:

setkey(dt, idb, date)

dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0][]
#   ida idb value       date
#1:   A   2 26600 2004-12-31
#2:   A   2     0 2005-03-31
#3:   A   3     0 2004-12-31
#4:   A   3 19600 2005-03-31
#5:   C   2  8700 2005-12-31
#6:   B   3 18200 2005-06-30
#7:   B   3     0 2005-09-30
#8:   B   4     0 2005-06-30
#9:   B   4  1230 2005-09-30
eddi
  • 49,088
  • 6
  • 104
  • 155