nested If statement on dates

Question

I have a dataframe df which is as follows.

Id     ProcessDate
10     2011-12-29 14:14:00
11     2011-12-29 14:16:00
12     2011-12-29 14:14:00
13     2011-12-29 14:20:00
14     2011-12-29 14:49:00
15     2011-12-29 14:51:00
16     2011-12-29 14:53:00
17     2011-12-29 15:11:00
18     2011-12-29 15:13:00 
19     2011-12-29 15:10:00
20     2011-12-29 15:21:00
21     2011-12-29 14:34:00
22     2011-12-29 15:26:00

I am trying to create a third column Status that will contain either one of these three values {Before, during , after } based on this condition.

 if  (df$ProcessDate < 2011-12-29 14:48:00)
 then  df$Status = "Before"
 else if (df$ProcessDate > 2011-12-29 14:48:00 & df$ProcessDate < 2011-12-29 15:16:00)
 then  df$Status = "Between"
 else  df$Status = "After"

The final dataframe should look like this.

Id     ProcessDate              Status
10     2011-12-29 14:14:00      Before
11     2011-12-29 14:16:00      Before
12     2011-12-29 14:14:00      Before
13     2011-12-29 14:20:00      Before
14     2011-12-29 14:49:00      Between
15     2011-12-29 14:51:00      Between       
16     2011-12-29 14:53:00      Between
17     2011-12-29 15:11:00      Between
18     2011-12-29 15:13:00      Between
19     2011-12-29 15:10:00      Between
20     2011-12-29 15:21:00      After
21     2011-12-29 14:34:00      After
22     2011-12-29 15:26:00      After

I tried few things and it didn't work, any help on this issue is much appreciated.

There are [a few examples](http://stackoverflow.com/questions/14202008/add-column-values-based-on-other-columns-in-data-frame-using-for-and-if) of [similar questions](http://stackoverflow.com/questions/7658316/create-new-column-based-on-4-values-in-another-column) - what have you tried and why didn't it work? — SymbolixAU, Sep 23 '16 at 04:44
@KimJenkins I think your second to last line should have the status 'Before', right? — DAXaholic, Sep 23 '16 at 05:28

DAXaholic · Answer 1 · 2016-09-23T05:28:02.540

This could be one possible solution

ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29 14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
df <- data.frame(Id = ids, 
                 ProcessDate = strptime(date, format = '%Y-%m-%d %H:%M:%S'))


date.status.before <- strptime('2011-12-29 14:48:00', format = '%Y-%m-%d %H:%M:%S')
date.status.after <- strptime('2011-12-29 15:16:00', format = '%Y-%m-%d %H:%M:%S')
ProcessDateStatus <- function(process.date) {
  if  (process.date < date.status.before)
    "Before"
  else if (process.date > date.status.before & process.date < date.status.after)
    "Between"
  else 
    "After"  
}
df$Status <- lapply(df$ProcessDate, ProcessDateStatus)

resulting in

   Id         ProcessDate  Status
1  10 2011-12-29 14:14:00  Before
2  11 2011-12-29 14:16:00  Before
3  12 2011-12-29 14:14:00  Before
4  13 2011-12-29 14:20:00  Before
5  14 2011-12-29 14:49:00 Between
6  15 2011-12-29 14:51:00 Between
7  16 2011-12-29 14:53:00 Between
8  17 2011-12-29 15:11:00 Between
9  18 2011-12-29 15:13:00 Between
10 19 2011-12-29 15:10:00 Between
11 20 2011-12-29 15:21:00   After
12 21 2011-12-29 14:34:00  Before
13 22 2011-12-29 15:26:00   After

alistaire · Answer 2 · 2016-09-23T22:50:10.247

Subset assignment

A very simple way to do this in base R for this particular case is to set everything to 'Between', and then use subset assignment to alter the rows that should be something else:

df$ProcessDate <- as.POSIXct(df$ProcessDate)    # skip if already parsed to datetime

df$Status <- 'Between'
df$Status[df$ProcessDate < as.POSIXct('2011-12-29 14:48:00')] <- 'Before'
df$Status[df$ProcessDate >= as.POSIXct('2011-12-29 15:16:00')] <- 'After'

df
##    Id         ProcessDate  Status
## 1  10 2011-12-29 14:14:00  Before
## 2  11 2011-12-29 14:16:00  Before
## 3  12 2011-12-29 14:14:00  Before
## 4  13 2011-12-29 14:20:00  Before
## 5  14 2011-12-29 14:49:00 Between
## 6  15 2011-12-29 14:51:00 Between
## 7  16 2011-12-29 14:53:00 Between
## 8  17 2011-12-29 15:11:00 Between
## 9  18 2011-12-29 15:13:00 Between
## 10 19 2011-12-29 15:10:00 Between
## 11 20 2011-12-29 15:21:00   After
## 12 21 2011-12-29 14:34:00  Before
## 13 22 2011-12-29 15:26:00   After

`cut`

The purpose-built way to do it is to use cut, which has a cut.POSIXt method. It requires breakpoints before and after your data in addition to the ones you already want, but will make a nice factor for the categorical data.

df$Status <- cut(df$ProcessDate, 
                 breaks = c(min(df$ProcessDate), 
                          as.POSIXct(c('2011-12-29 14:48:00', '2011-12-29 15:16:00')), 
                          max(df$ProcessDate) + 1), 
                 labels = c('Before', 'Between', 'After'))

Nested `ifelse` calls

The most common and versatile base version would be nested ifelse calls, which can look ugly (especially if there are a lot of them), but evaluate quickly because ifelse is vectorized whereas if is not:

df$Status <- ifelse(df$ProcessDate < as.POSIXct('2011-12-29 14:48:00'), 
                    'Before', 
                    ifelse(df$ProcessDate < as.POSIXct('2011-12-29 15:16:00'), 
                           'Between', 
                           'After'))

dplyr

dplyr::case_when is a nice alternative to nested ifelse calls. It evaluates each condition successively and returns the corresponding value:

library(dplyr)

df %>% mutate(
    ProcessDate = as.POSIXct(ProcessDate),    # skip this line if already datetime
                       # if this is true,                      then return "Before"
    Status = case_when(.$ProcessDate < as.POSIXct('2011-12-29 14:48:00') ~ 'Before',
                       # for the rest, if this is true,             return "Between"
                       .$ProcessDate < as.POSIXct('2011-12-29 15:16:00') ~ 'Between',
                       # always true, so make the rest "After"
                       TRUE ~ 'After'))

All versions return the same thing, aside from cut, which returns a factor instead of a character vector.

score 4 · Answer 3 · answered Sep 23 '16 at 06:26

Try this:

left <- as.POSIXct("12/29/2011 14:48", format = "%m/%d/%Y %H:%M") 
right <- as.POSIXct("12/29/2011 15:16", format = "%m/%d/%Y %H:%M") 
DT[, Status := ifelse(ProcessDate < left, "before", 
            ifelse(ProcessDate > right, "after", "between"))]

It gives:

    Id         ProcessDate  Status
 1: 10 2011-12-29 14:14:00  before
 2: 11 2011-12-29 14:16:00  before
 3: 12 2011-12-29 14:14:00  before
 4: 13 2011-12-29 14:20:00  before
 5: 14 2011-12-29 14:49:00 between
 6: 15 2011-12-29 14:51:00 between
 7: 16 2011-12-29 14:53:00 between
 8: 17 2011-12-29 15:11:00 between
 9: 18 2011-12-29 15:13:00 between
10: 19 2011-12-29 15:10:00 between
11: 20 2011-12-29 15:21:00   after
12: 21 2011-12-29 15:34:00   after
13: 22 2011-12-29 15:26:00   after

The same result as above, with vectorizable ifelse() and data.table

score 0 · Accepted Answer · answered Sep 23 '16 at 05:27

One of the possible solution is to convert your time into epoch value and then compare. This can be done by using as.integer(as.POSIXct("Time")) as shown below

df = NULL
df$ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
df$date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29      14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
df = as.data.frame(df)
df$date = as.integer(as.POSIXct(df$date))

upper   = as.integer(as.POSIXct('2011-12-29 15:16:00'))
lower   = as.integer(as.POSIXct('2011-12-29 14:48:00'))

You will have the converted date column as below

> df
    ids       date
1   10 1325148240
2   11 1325148360
3   12 1325148240
4   13 1325148600
5   14 1325150340
6   15 1325150460
7   16 1325150580
8   17 1325151660
9   18 1325151780
10  19 1325151600
11  20 1325152260
12  21 1325149440
13  22 1325152560

Then you can simply perform a numeric comparison

for(i in c(1:nrow(df))){
    if(df$date[i] < lower)
            df$Status[i] = "Before"
    else if(df$date[i] > lower & df$date[i] < upper)
            df$Status[i] = "Between"
    else
            df$Status[i] = "After"
}

Resulting in the output

> df
    ids       date  Status
1   10 1325148240  Before
2   11 1325148360  Before
3   12 1325148240  Before
4   13 1325148600  Before
5   14 1325150340 Between
6   15 1325150460 Between
7   16 1325150580 Between
8   17 1325151660 Between
9   18 1325151780 Between
10  19 1325151600 Between
11  20 1325152260   After
12  21 1325149440  Before
13  22 1325152560   After

To preserve the format of the date you can perform the conversion from date and time to epoch inside the condition for each row. — 9Heads, Sep 23 '16 at 05:32
There's not really a point of converting to numeric; comparison operators (`<`, etc.) work on POSIX*t datetimes directly, so all you're doing is making the data harder to read. — alistaire, Sep 23 '16 at 14:43
As per my previous comment you can preserve the data format. The transformation need not be explicit. The transformation was done to show the concept of epoch while comparing the date. — 9Heads, Sep 23 '16 at 15:03

nested If statement on dates

4 Answers4

Subset assignment

cut

Nested ifelse calls

dplyr

`cut`

Nested `ifelse` calls