I have a data.table in R that was fetched from a database that looks like this:
date,identifier,description,location,value1,value2
2014-03-01,1,foo,1,100,200
2014-03-01,1,foo,2,200,300
2014-04-01,1,foo,1,100,200
2014-04-01,1,foo,2,100,200
2014-05-01,1,foo,1,100,200
2014-05-01,1,foo,2,100,200
2014-03-01,2,bar,1,100,200
2014-04-01,2,bar,1,100,200
2014-05-01,2,bar,1,100,200
2014-03-01,3,baz,1,100,200
2014-03-01,3,baz,2,200,300
2014-04-01,3,baz,1,100,200
2014-04-01,3,baz,2,100,200
2014-05-01,3,baz,1,100,200
2014-05-01,3,baz,2,100,200
2014-05-01,4,quux,2,100,200
<SNIP>
In order to do some calculations on the data, I'd like to massage it so that each combination of date, identifier, description and location has a row in the table with NA as value1 and value2. I know the range of date and all potential values for location.
I'm new to both R and data.table and my mind is mush at this point. The result I'd like to come up with for the above sample table is:
date,identifier,description,location,value1,value2
2014-03-01,1,foo,1,100,200
2014-03-01,1,foo,2,200,300
2014-04-01,1,foo,1,100,200
2014-04-01,1,foo,2,100,200
2014-05-01,1,foo,1,100,200
2014-05-01,1,foo,2,100,200
2014-03-01,2,bar,1,100,200
2014-03-01,2,bar,2,NA,NA
2014-04-01,2,bar,1,100,200
2014-04-01,2,bar,2,NA,NA
2014-05-01,2,bar,1,100,200
2014-05-01,2,bar,2,NA,NA
2014-03-01,3,baz,1,100,200
2014-03-01,3,baz,2,200,300
2014-04-01,3,baz,1,100,200
2014-04-01,3,baz,2,100,200
2014-05-01,3,baz,1,100,200
2014-05-01,3,baz,2,100,200
2014-03-01,4,quux,1,NA,NA
2014-03-01,4,quux,2,NA,NA
2014-04-01,4,quux,1,NA,NA
2014-04-01,4,quux,2,NA,NA
2014-05-01,4,quux,1,NA,NA
2014-05-01,4,quux,2,100,200
The data in the database is sparse in that a given identifier/description/location combination could have any number of entries or none at all for each date. I want to get to for a given date range (e.g., 2014-03-01 through 2014-05-01) each identifier/description and location has a row in the table.
This seems like something there'd be an interesting data.table trick to do, but I'm blanking.
Edit: I did this on a smaller scale for one identifier/description by merging in another datatable but I'm not sure how to do this with the added complexity of multiple identifier/descriptions and locations.
Thanks very much for your responses.
Here is dput output of the original data that can be readily copied into R:
structure(list(date = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 2L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 3L),
.Label = c("2014-03-01", "2014-04-01", "2014-05-01"), class = "factor"),
identifier = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L),
description = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 4L),
.Label = c("bar", "baz", "foo", "quux"), class = "factor"),
location = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L),
value1 = c(100L, 200L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 200L, 100L, 100L, 100L, 100L, 100L),
value2 = c(200L, 300L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 300L, 200L, 200L, 200L, 200L, 200L)),
.Names = c("date", "identifier", "description", "location", "value1", "value2"),
row.names = c(NA, -16L),
class = c("data.table", "data.frame"))