I have a data table of three columns id, dtstart, dtend. For example:
id start end
1 01/01/2015 31/01/2015
1 02/02/2015 28/02/2015
1 01/07/2016 31/07/2016
1 01/08/2016 31/08/2016
2 01/03/2015 31/03/2015
2 01/04/2015 30/04/2015
2 01/02/2016 28/02/2016
2 01/03/2016 31/03/2016
...
I need to create another data table grouped by id with the same columns but the new start date is the minimum date in the original start date and the new end date is the maximum date in the original dtend.
When there is a break of more then one day between an end date and the next start date then it should be grouped separately.
For example for the above the new table would be:
id start end
1 01/01/2015 28/02/2015
1 01/07/2016 31/08/2016
2 01/03/2015 30/04/2016
2 01/02/2016 31/03/2016
...
Do I need a for loop or is there a more efficient way (data table grouping for example)? The table is over 20 million rows with 100k+ unique ids.
Cheers Andrew