I have a data set including the following info:
id class year n
25 A63 2006 3
25 F16 2006 1
39 0901 2001 1
39 0903 2001 3
39 0903 2003 2
39 1901 2003 1
...
There are about 100k different ids and more than 300 classes. The year varies from 1998 to 2007.
What I want to do, is to fill the time gap, after some id and classes happened, with n=0 by id and class.
And then calculate the sum of n and the quantity of classes.
For example, the above 6 lines data should expand to the following table:
id class year n sum Qc Qs
25 A63 2006 3 3 2 2
25 F16 2006 1 1 2 2
25 A63 2007 0 3 0 2
25 F16 2007 0 1 0 2
39 0901 2001 1 1 2 2
39 0903 2001 3 3 2 2
39 0901 2002 0 1 0 2
39 0903 2002 0 3 0 2
39 0901 2003 0 1 2 3
39 0903 2003 2 5 2 3
39 1901 2003 1 1 2 3
39 0901 2004 0 1 0 3
39 0903 2004 0 5 0 3
39 1901 2004 0 1 0 3
...
39 0901 2007 0 1 0 3
39 0903 2007 0 5 0 3
39 1901 2007 0 1 0 3
I can solve it by the ugly for loop and it will takes one hour to get the result. Is there any better way to do that? Vectorize or using the data.table?