Find time difference between two columns based on unique id and add the absence days

Question

I am stuck on a problem where i want to find out the absent days for an employee where if he is not coming for 3 consecutive days it should add the 3 days in a new column (it might for for multiple days) the problem is start date and end date is there so i want to match if the employee is same, start date of the next absence is consecutive day it should add it , i am attaching the screenshot and table index here . Any help from Excel or R will be helpful. I have already tried Max if and Sumif. Only problem is I only want addition if he/she is absent for the consecutive days

ID  START_DATE  END_DATE    ABSENCE_DAYS
3   14-06-18    14-06-18    1
3   17-06-18    17-06-18    1
3   18-06-18    18-06-18    1
4   01-06-18    01-06-18    1
4   04-06-18    04-06-18    1
4   21-06-18    22-06-18    2
4   27-06-18    27-06-18    1
4   28-06-18    28-06-18    1
4   04-07-18    04-07-18    1
4   05-07-18    05-07-18    1
4   09-07-18    09-07-18    1
4   11-07-18    11-07-18    1
4   23-07-18    23-07-18    1
4   24-07-18    24-07-18    1
4   25-07-18    25-07-18    1
5   07-06-18    08-06-18    2
5   28-06-18    28-06-18    1
5   27-07-18    27-07-18    0.5
6   10-06-18    11-06-18    2
6   17-06-18    21-06-18    5
6   24-06-18    25-06-18    2
6   26-06-18    03-07-18    6
6   15-07-18    15-07-18    1
6   22-07-18    22-07-18    1

For Example Employee 4 has taken 3 consecutive leaves on 23,24 and 25th so in a new column where it says he was absent for 3 consecutive days.

updated

Desired output would look something like this , this is just a sample

There are several questions like this, here is but one : https://stackoverflow.com/q/50194816/4961700 — Solar Mike, Aug 02 '18 at 06:55
do you have a time stamp, or how do you calculate the 0.5 days? — Stephan, Aug 02 '18 at 06:59
@SolarMike thanks for your quick response but my problem is different from the question you have provided. — Hunaidkhan, Aug 02 '18 at 06:59
@Stephan right now we can ignore the time stamp and just add the absent days values. this is where i am stuck. thanks — Hunaidkhan, Aug 02 '18 at 07:00
Do I understand correctly that for ID 3 17.6.18 and 18.6.18 should be added together because they are consecutive? And I do not get the sense of the last two columns in your example? Isn't that two times the same? What is the sense of that? — Benjamin Schlegel, Aug 02 '18 at 07:08
@Stephan thanks but your result is giving output based on the absent days provided in the column what i want is if the end date of the 1st occurrence and start date of the next day is consecutive it should add the absent days , i hope it clarifies it — Hunaidkhan, Aug 02 '18 at 07:09
@BenjaminSchlegel some people take leave for 2 days or 10 days also so 2nd column is Leave start day , 3rd column is leave end date and for example id 4 has taken 3 consecutive leaves on 23,24 25 so it should add all the three days absent days. some people take consecutive leave for 5 to 10 days also so it should be 10 in the new column — Hunaidkhan, Aug 02 '18 at 07:11
@Hunaidkhan ok I get that, but I still do not get why ID 6 22-07-18 22-07-18 1 is dubblicated. Should be added to one or two days? — Benjamin Schlegel, Aug 02 '18 at 07:17
@BenjaminSchlegel you can ignore it right now he might have applied twice for the same day so we can remove that case — Hunaidkhan, Aug 02 '18 at 07:19
@Hunaidkhan I gave that Q & A as an example - i did point out that there are MANY like this - you could do some research... — Solar Mike, Aug 02 '18 at 07:27
@SolarMike thanks but after doing a lot of research i have to post this question, i hope you understand — Hunaidkhan, Aug 02 '18 at 07:28

A. Suliman · Answer 1 · 2018-08-02T07:47:54.120

1

Using dplyr::lag and dplyr::lead we can check END and START to see if they are consecutive

library(dplyr)
library(lubridate)
data %>% group_by(ID) %>% 
          mutate(Forward = dmy(START_DATE)-lag(dmy(END_DATE)), Backward = dmy(END_DATE)-lead(dmy(START_DATE)), 
                 Flag=ifelse(Forward==1 | Backward==-1, TRUE,FALSE), 
                 Total=sum(ABSENCE_DAYS[Flag],na.rm = T))

Data

data <- read.table(text="
              ID  START_DATE  END_DATE    ABSENCE_DAYS
               3   14-06-18    14-06-18    1
               3   17-06-18    17-06-18    1
               3   18-06-18    18-06-18    1
               4   01-06-18    01-06-18    1
               4   04-06-18    04-06-18    1
               4   21-06-18    22-06-18    2
               4   27-06-18    27-06-18    1
               4   28-06-18    28-06-18    1
               4   04-07-18    04-07-18    1
               4   05-07-18    05-07-18    1
               4   09-07-18    09-07-18    1
               4   11-07-18    11-07-18    1
               4   23-07-18    23-07-18    1
               4   24-07-18    24-07-18    1
               4   25-07-18    25-07-18    1
               5   07-06-18    08-06-18    2
               5   28-06-18    28-06-18    1
               5   27-07-18    27-07-18    0.5
               6   10-06-18    11-06-18    2
               6   17-06-18    21-06-18    5
               6   24-06-18    25-06-18    2
               6   26-06-18    03-07-18    6
               6   15-07-18    15-07-18    1
               6   22-07-18    22-07-18    1
               6   22-07-18    22-07-18    1
               ",header=T, stringsAsFactors = F)

edited Aug 02 '18 at 07:47

answered Aug 02 '18 at 07:38

A. Suliman

12,923
5
24
37

thanks only thing is count of absence day is not matching for example id 4 answer should be 3 instead i am getting 5 absence days – Hunaidkhan Aug 02 '18 at 08:53
@Hunaidkhan Sorry, but it is very difficult to guess without data. try to double check id 4 – A. Suliman Aug 02 '18 at 09:12
i have checked for all the fields sum of absence is not coming , here forward function takes all the values coming in order but not the successive dates. i want to calculate days only when end date of 1 day and start day of another day is consecutive , i hope you understood – Hunaidkhan Aug 02 '18 at 09:18
Yes forward and backward are helpers columns and return many values. Does Flag return TRUE and FALSE when forward and backward ==1 or -1. – A. Suliman Aug 02 '18 at 09:23
yes it returns true and false but what it does it , in the Total columns it is giving countif or you can say retuns the number of count of that unique id – Hunaidkhan Aug 02 '18 at 09:25
Thanks mate, made slight changes and it is done. Thanks for your prompt response – Hunaidkhan Aug 02 '18 at 09:38
@Hunaidkhan glad to hear that. you are most welcome. – A. Suliman Aug 02 '18 at 09:39

score 1 · Accepted Answer · answered Aug 02 '18 at 09:33

Here is a solution using data.table. You can check if the START_DATE of current row is one day after the END_DATE of the previous row and use cumsum to group these together. After that, it is just a simple sum of ABSENCE_DAYS once you can group them properly.

welfare[, TOTAL_ABSENCE := sum(ABSENCE_DAYS), 
    by=.(ID, cumsum(START_DATE != shift(END_DATE, fill=1L) + 1L))]

output:

    ID START_DATE   END_DATE ABSENCE_DAYS TOTAL
 1:  3 2018-06-14 2018-06-14          1.0   1.0
 2:  3 2018-06-17 2018-06-17          1.0   2.0
 3:  3 2018-06-18 2018-06-18          1.0   2.0
 4:  4 2018-06-01 2018-06-01          1.0   1.0
 5:  4 2018-06-04 2018-06-04          1.0   1.0
 6:  4 2018-06-21 2018-06-22          2.0   2.0
 7:  4 2018-06-27 2018-06-27          1.0   2.0
 8:  4 2018-06-28 2018-06-28          1.0   2.0
 9:  4 2018-07-04 2018-07-04          1.0   2.0
10:  4 2018-07-05 2018-07-05          1.0   2.0
11:  4 2018-07-09 2018-07-09          1.0   1.0
12:  4 2018-07-11 2018-07-11          1.0   1.0
13:  4 2018-07-23 2018-07-23          1.0   3.0
14:  4 2018-07-24 2018-07-24          1.0   3.0
15:  4 2018-07-25 2018-07-25          1.0   3.0
16:  5 2018-06-07 2018-06-08          2.0   2.0
17:  5 2018-06-28 2018-06-28          1.0   1.0
18:  5 2018-07-27 2018-07-27          0.5   0.5
19:  6 2018-06-10 2018-06-11          2.0   2.0
20:  6 2018-06-17 2018-06-21          5.0   5.0
21:  6 2018-06-24 2018-06-25          2.0   8.0
22:  6 2018-06-26 2018-07-03          6.0   8.0
23:  6 2018-07-15 2018-07-15          1.0   1.0
24:  6 2018-07-22 2018-07-22          1.0   1.0
    ID START_DATE   END_DATE ABSENCE_DAYS TOTAL

data:

library(data.table)
welfare <- fread(
"ID  START_DATE  END_DATE    ABSENCE_DAYS
3   14-06-18    14-06-18    1
3   17-06-18    17-06-18    1
3   18-06-18    18-06-18    1
4   01-06-18    01-06-18    1
4   04-06-18    04-06-18    1
4   21-06-18    22-06-18    2
4   27-06-18    27-06-18    1
4   28-06-18    28-06-18    1
4   04-07-18    04-07-18    1
4   05-07-18    05-07-18    1
4   09-07-18    09-07-18    1
4   11-07-18    11-07-18    1
4   23-07-18    23-07-18    1
4   24-07-18    24-07-18    1
4   25-07-18    25-07-18    1
5   07-06-18    08-06-18    2
5   28-06-18    28-06-18    1
5   27-07-18    27-07-18    0.5
6   10-06-18    11-06-18    2
6   17-06-18    21-06-18    5
6   24-06-18    25-06-18    2
6   26-06-18    03-07-18    6
6   15-07-18    15-07-18    1
6   22-07-18    22-07-18    1")    
cols <- c("START_DATE", "END_DATE")
welfare[, (cols) := lapply(.SD, as.Date, format="%d-%m-%y"), .SDcols=cols]

Find time difference between two columns based on unique id and add the absence days

updated

2 Answers2

Data