I have a huge panel data, more than 10 GB. The data looks like the following:
ID Start_time Factor End_time
1001 1611 0.12 1861
1001 1612 -0.01 1862
...
1001 1861 0.57 2111
1001 1862 0.06 2112
...
1002 1200 0.01 1450
1002 1201 0.52 1451
...
1002 1450 -0.21 1700
1002 1451 0.30 1701
...
The data is sorted by ID and Start_time value. I want to calculate the sum of Factor that for each ID from Start_time to the corresponding End_time.
An example of output:
ID Start_time Factor End_time Cumulative_factor
1001 1611 0.12 1861 0.12+(-0.01)+...+0.57
1001 1612 -0.01 1862 -0.01+...+0.57+0.06
...
1001 1861 0.57 2111 0.57+0.06+...
1001 1862 0.06 2112 0.06+...
...
1002 1200 0.01 1450 0.01+0.52+...+(-0.21)
1002 1201 0.52 1451 0.52+...+(-0.21)+0.30
...
1002 1450 -0.21 1700 -0.21+0.30+...
1002 1451 0.30 1701 0.30+...
...
Since I have more than 10 million observation, Is there a efficient way to calculate it?