So I've got a data frame like below:
+-----------+---------------+-------+------------+
| Policy_NO | Creation_Date | Limit | Limit_Date |
+-----------+---------------+-------+------------+
| A00001 | 8/31/2015 | 1000 | 8/31/2015 |
| A00001 | 8/31/2015 | 2000 | 9/30/2015 |
| A00001 | 8/31/2015 | 5000 | 10/22/2015 |
| A00001 | 8/31/2015 | 500 | 11/17/2015 |
| A00003 | 9/21/2015 | 3000 | 1/1/2016 |
+-----------+---------------+-------+------------+
And what I want is to have an 'End_Limit_Date' which should either be the date of the next limit with the same policy number -1 or exactly one year from the Creation date.
For example, the table above should be:
+------------------------------------------------+----------------+
| Policy_NO Creation_Date Limit Limit_Date | End_Limit_Date |
+------------------------------------------------+----------------+
| A00001 8/31/2015 1000 8/31/2015 | 9/29/2015 |
| A00001 8/31/2015 2000 9/30/2015 | 10/21/2015 |
| A00001 8/31/2015 5000 10/22/2015 | 11/16/2015 |
| A00001 8/31/2015 500 11/17/2015 | 8/31/2016 |
| A00003 9/21/2015 3000 1/1/2016 | 9/21/2016 |
+------------------------------------------------+----------------+
Now in sql server, this can be achieved by using an outer apply like below:
SELECT t.Policy_NO,
t.Creation_Date,
t.Limit,
t.Limit_Date,
End_Limit_Date = ISNULL(
DATEADD(DAY, -1, t2.Limit_Date),
DATEADD(YEAR, 1, t.Creation_Date))
FROM dbo.T
OUTER APPLY
( SELECT TOP 1 t2.Limit_Date
FROM dbo.T AS t2
WHERE t2.Policy_NO = t.Policy_NO
AND t2.Limit_Date > t.Limit_Date
ORDER BY t2.Limit_Date
) AS t2;
but I was wondering if there's a way to do this in R with data frames? I've looked into using the sqldf package (https://github.com/ggrothendieck/sqldf#FAQ) , and I don't think it's supported. I do know that R itself has several apply functions (https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/), and I was wondering if I can achieve the same result using this?
Input data:
dput(df)
structure(list(Policy_NO = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("A00001",
"A00003"), class = "factor"), Creation_Date = structure(c(16678,
16678, 16678, 16678, 16699), class = "Date"), Limit = c(1000L,
2000L, 5000L, 500L, 3000L), Limit_Date = structure(c(16678, 16708,
16730, 16756, 16801), class = "Date")), .Names = c("Policy_NO",
"Creation_Date", "Limit", "Limit_Date"), row.names = c(NA, -5L
), class = "data.frame")