I have a data frame for each individual illustrated as rows. I want to add a column that will calculate the portion of the year the employee was employed for each Year. I do not want to add a column for each year but rather have a new variable Year that will contain the result of my formula. the first element of this variable would be the first year of employment. I have 25,000,000 rows, so looping is not an option. Let assume a simple example as below:
age Sex DateHire DateLeft
1 45 M 6/30/1980 12/31/2006
2 37 F 6/25/1994 N/A
3 42 F 12/30/2015 12/30/2017
I want my output to look like this.
Year Exposure Expected_Death
1 3 0.0046
2 3 0.0058
3 3 0.0071
4 2 0.008
Exposure is the number of employees for a give Year where Year1 refers to 1st year of employment. Expected_Death <- Exposure * Probability_Death
For each individual, I want to calculate the Exposure for the next 100 years after their employment. I also want to read the Probability_Death from an Excel File.
For each individual I wanted to define a variable Exposure and Probability_Death as such
age Sex DateHire DateLeft Exposure Prob_Death
1 45 M 6/30/1980 12/31/2006 1 1 1 1 1 ... 0.0035 0.0041 ...
2 37 F 6/25/1994 N/A 1 1 1 1 1 ... 0.0009 0.0014 ...
3 42 F 12/30/2015 12/30/2017 1 1 1 0 0 ... 0.0002 0.0003 ...
My Exposure is calculate from the date of hire and DateLeft. How can I define such calculated variables so my Expected Death for all years will be calculated simply by multiplying Exposure by Prob_Death ? thanks