2

I have a data frame for each individual illustrated as rows. I want to add a column that will calculate the portion of the year the employee was employed for each Year. I do not want to add a column for each year but rather have a new variable Year that will contain the result of my formula. the first element of this variable would be the first year of employment. I have 25,000,000 rows, so looping is not an option. Let assume a simple example as below:

  age Sex   DateHire   DateLeft
1  45   M  6/30/1980 12/31/2006
2  37   F  6/25/1994        N/A
3  42   F 12/30/2015 12/30/2017

I want my output to look like this.

Year Exposure   Expected_Death
 1    3         0.0046
 2    3         0.0058
 3    3         0.0071
 4    2         0.008 

Exposure is the number of employees for a give Year where Year1 refers to 1st year of employment. Expected_Death <- Exposure * Probability_Death

For each individual, I want to calculate the Exposure for the next 100 years after their employment. I also want to read the Probability_Death from an Excel File.

For each individual I wanted to define a variable Exposure and Probability_Death as such

  age Sex   DateHire   DateLeft       Exposure       Prob_Death
    1  45   M  6/30/1980 12/31/2006   1 1 1 1 1 ...  0.0035 0.0041 ...
    2  37   F  6/25/1994        N/A   1 1 1 1 1 ... 0.0009 0.0014 ...
    3  42   F 12/30/2015 12/30/2017   1 1 1 0 0 ... 0.0002 0.0003 ...

My Exposure is calculate from the date of hire and DateLeft. How can I define such calculated variables so my Expected Death for all years will be calculated simply by multiplying Exposure by Prob_Death ? thanks

  • Thanks, but as a second phase, I want to add another variable that will be the probability of dying in each year and multiply this probability by the variable Year. This gives an expected number of deaths that I can compare with actual deaths. I think I need to have specific variables to accomplish this. – Michel Hebert Aug 01 '18 at 14:29
  • I suggest you add data for what your desired output looks like. Preferably a minimal set required for someone to produce code to answer your question. Also, you said you don't want to add a column for each year, do you want new rows to be added to your table ? – steveb Aug 01 '18 at 23:37
  • I modified my input data in order for my 3rd individual to be hired in 2015. My output would look like this: Year Exposure Expected_Death 1 3 0.0046 2 3 0.0058 3 3 0.0071 4 2 0.0080 Exposure is the number of employees in each year. Year 1 is defined at 1st year of employment. Since Employee 3 started in 2015 and left in 2017, he is no longer there in Year 4 and my Exposure is down to 2. Expected_Death is my Exposure x Probability_Death. This probabability would be fetched in an excel file. – Michel Hebert Aug 02 '18 at 17:53
  • You should edit your question and put the expected output there. You should look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – steveb Aug 02 '18 at 17:55
  • my output is supposed to be a table 3x3 Year 1 has Exposure of 3 and Expected_Death of 0.0046 Year 2 has Exposure of 3 and Expected_Death of 0.0058 Year 3 has Exposure of 3 and Expected Death of 0.0072 Year 4 has Exposure of 2 and Expected_Death of 0.008 my table should continue for 100 years Expected Death <- Probability_of_Death * Exposure. – Michel Hebert Aug 02 '18 at 17:58
  • i modified my initial question to make it clearer. Thanks for comments. – Michel Hebert Aug 02 '18 at 18:26
  • It looks like you want the `Exposure` and `Prob_Death` columns to be contain a single space delimited string with your values, correct ? – steveb Aug 02 '18 at 20:52

0 Answers0