I have a table of Customer_ID, showing Payments by Year. The first (of many) customer appears like this:
ID Payment Year
112 0 2004
112 0 2005
112 0 2006
112 9592 2007
112 12332 2008
112 9234 2011
112 5400 2012
112 7392 2014
112 8321 2015
Note that some years are missing. I need to create 10 new columns, showing the Payments in the previous 10 years, for each row. The resulting table should look like this:
ID Payment Year T-1 T-2 T-3 T-4 T-5 T-6 T-7 T-8 T-9 T-10
112 0 2004 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2005 0 NULL NULL NULL NULL NULL NULL NULL NULL NULL
112 0 2006 0 0 NULL NULL NULL NULL NULL NULL NULL NULL
112 952 2007 0 0 0 NULL NULL NULL NULL NULL NULL NULL
112 1232 2008 952 0 0 0 NULL NULL NULL NULL NULL NULL
112 924 2011 NULL NULL 1232 952 0 0 0 NULL NULL NULL
112 500 2012 924 NULL NULL 1232 952 0 0 0 NULL NULL
112 392 2014 NULL 500 924 NULL NULL 1232 952 0 0 0
112 821 2015 392 NULL 500 924 NULL NULL 1232 952 0 0
(I know this is duplicating data - it is being prepared for a predictive model, in which previous payments (and other info) will be used to predict the current year's payment)
In SQL I would left join the table to itself, joining on ID and Year=(Year-1) etc... but I can't figure out how to do this in R.
I've also thought about using dplyr to group by ID, then mutate the new columns using lag, before ungrouping. But my tables are very large and I think this would be too slow. Ideally I would like to use data.table instead, but can't figure out how.
Any help much appreciated.