I have a dataset of portfolio holdings:
# Input test data
portolios <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "C", "C", "A"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Partial", "Full"), shares = c(100L, 100L, 130L, 50L, 75L, 80L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares
1: 1 A 2010-03-31 Full 100
2: 1 B 2010-03-31 Full 100
3: 1 A 2010-06-30 Full 130
4: 1 C 2010-06-30 Full 50
5: 1 C 2010-07-15 Partial 75
6: 1 A 2010-08-31 Full 80
I need to impute the following missing records:
7: 1 B 2010-06-30 Full 0
8: 1 C 2010-08-31 Full 0
The business issue is that sales of positions (shares = 0) are sometimes not reported for Full report_type, so the missing SecID have to be imputed based on the prior report.
Ultimately I am seeking to calculate the change in shares for each SecID from the prior report for each portfolioID such that my dataset looks like this:
changes <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "B", "C", "C", "A", "C"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Full", "Partial", "Full", "Full"), shares = c(100L, 100L, 130L, 0L, 50L, 75L, 80L, 0L), change = c(100L, 100L, 30L, -100L, 50L, 25L, -50L, -75L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares", "change"), row.names = c(NA, -8L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares change
1: 1 A 2010-03-31 Full 100 100
2: 1 B 2010-03-31 Full 100 100
3: 1 A 2010-06-30 Full 130 30
4: 1 B 2010-06-30 Full 0 -100
5: 1 C 2010-06-30 Full 50 50
6: 1 C 2010-07-15 Partial 75 25
7: 1 A 2010-08-31 Full 80 -50
8: 1 C 2010-08-31 Full 0 -75
I am stuck with how to create the i for the outer join portfolios[i]. My issue is that I don't want to use i <- CJ(reportdate, secid)
because it will produce too many records that are unnecessary since not every secid exists at every ReportDate and not correctly represent the data that needs filling.
I think I need a rolling cross-join between reportdate,reportdate[-1,secid]
I want to roll forward the secid and set shares := 0 when secid is missing in a Full report but it existed in prior report (either Partial or Full). I believe I would do this with option roll=1 but I am not sure where or how to implement.
I think my problem is similar to
How to Calculate a rolling statistic in R using data.table on unevenly spaced data
I am sure I am missing something basic in my understanding or a trick with CJ() that can create the requisite i