3

I have a dataframe df which is as follows.

Id     ProcessDate
10     2011-12-29 14:14:00
11     2011-12-29 14:16:00
12     2011-12-29 14:14:00
13     2011-12-29 14:20:00
14     2011-12-29 14:49:00
15     2011-12-29 14:51:00
16     2011-12-29 14:53:00
17     2011-12-29 15:11:00
18     2011-12-29 15:13:00 
19     2011-12-29 15:10:00
20     2011-12-29 15:21:00
21     2011-12-29 14:34:00
22     2011-12-29 15:26:00  

I am trying to create a third column Status that will contain either one of these three values {Before, during , after } based on this condition.

 if  (df$ProcessDate < 2011-12-29 14:48:00)
 then  df$Status = "Before"
 else if (df$ProcessDate > 2011-12-29 14:48:00 & df$ProcessDate < 2011-12-29 15:16:00)
 then  df$Status = "Between"
 else  df$Status = "After"

The final dataframe should look like this.

Id     ProcessDate              Status
10     2011-12-29 14:14:00      Before
11     2011-12-29 14:16:00      Before
12     2011-12-29 14:14:00      Before
13     2011-12-29 14:20:00      Before
14     2011-12-29 14:49:00      Between
15     2011-12-29 14:51:00      Between       
16     2011-12-29 14:53:00      Between
17     2011-12-29 15:11:00      Between
18     2011-12-29 15:13:00      Between
19     2011-12-29 15:10:00      Between
20     2011-12-29 15:21:00      After
21     2011-12-29 14:34:00      After
22     2011-12-29 15:26:00      After

I tried few things and it didn't work, any help on this issue is much appreciated.

DAXaholic
  • 33,312
  • 6
  • 76
  • 74
Kim Jenkins
  • 438
  • 3
  • 17
  • 1
    There are [a few examples](http://stackoverflow.com/questions/14202008/add-column-values-based-on-other-columns-in-data-frame-using-for-and-if) of [similar questions](http://stackoverflow.com/questions/7658316/create-new-column-based-on-4-values-in-another-column) - what have you tried and why didn't it work? – SymbolixAU Sep 23 '16 at 04:44
  • 2
    @KimJenkins I think your second to last line should have the status 'Before', right? – DAXaholic Sep 23 '16 at 05:28

4 Answers4

6

This could be one possible solution

ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29 14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
df <- data.frame(Id = ids, 
                 ProcessDate = strptime(date, format = '%Y-%m-%d %H:%M:%S'))


date.status.before <- strptime('2011-12-29 14:48:00', format = '%Y-%m-%d %H:%M:%S')
date.status.after <- strptime('2011-12-29 15:16:00', format = '%Y-%m-%d %H:%M:%S')
ProcessDateStatus <- function(process.date) {
  if  (process.date < date.status.before)
    "Before"
  else if (process.date > date.status.before & process.date < date.status.after)
    "Between"
  else 
    "After"  
}
df$Status <- lapply(df$ProcessDate, ProcessDateStatus)

resulting in

   Id         ProcessDate  Status
1  10 2011-12-29 14:14:00  Before
2  11 2011-12-29 14:16:00  Before
3  12 2011-12-29 14:14:00  Before
4  13 2011-12-29 14:20:00  Before
5  14 2011-12-29 14:49:00 Between
6  15 2011-12-29 14:51:00 Between
7  16 2011-12-29 14:53:00 Between
8  17 2011-12-29 15:11:00 Between
9  18 2011-12-29 15:13:00 Between
10 19 2011-12-29 15:10:00 Between
11 20 2011-12-29 15:21:00   After
12 21 2011-12-29 14:34:00  Before
13 22 2011-12-29 15:26:00   After
DAXaholic
  • 33,312
  • 6
  • 76
  • 74
4

Subset assignment

A very simple way to do this in base R for this particular case is to set everything to 'Between', and then use subset assignment to alter the rows that should be something else:

df$ProcessDate <- as.POSIXct(df$ProcessDate)    # skip if already parsed to datetime

df$Status <- 'Between'
df$Status[df$ProcessDate < as.POSIXct('2011-12-29 14:48:00')] <- 'Before'
df$Status[df$ProcessDate >= as.POSIXct('2011-12-29 15:16:00')] <- 'After'

df
##    Id         ProcessDate  Status
## 1  10 2011-12-29 14:14:00  Before
## 2  11 2011-12-29 14:16:00  Before
## 3  12 2011-12-29 14:14:00  Before
## 4  13 2011-12-29 14:20:00  Before
## 5  14 2011-12-29 14:49:00 Between
## 6  15 2011-12-29 14:51:00 Between
## 7  16 2011-12-29 14:53:00 Between
## 8  17 2011-12-29 15:11:00 Between
## 9  18 2011-12-29 15:13:00 Between
## 10 19 2011-12-29 15:10:00 Between
## 11 20 2011-12-29 15:21:00   After
## 12 21 2011-12-29 14:34:00  Before
## 13 22 2011-12-29 15:26:00   After

cut

The purpose-built way to do it is to use cut, which has a cut.POSIXt method. It requires breakpoints before and after your data in addition to the ones you already want, but will make a nice factor for the categorical data.

df$Status <- cut(df$ProcessDate, 
                 breaks = c(min(df$ProcessDate), 
                          as.POSIXct(c('2011-12-29 14:48:00', '2011-12-29 15:16:00')), 
                          max(df$ProcessDate) + 1), 
                 labels = c('Before', 'Between', 'After'))

Nested ifelse calls

The most common and versatile base version would be nested ifelse calls, which can look ugly (especially if there are a lot of them), but evaluate quickly because ifelse is vectorized whereas if is not:

df$Status <- ifelse(df$ProcessDate < as.POSIXct('2011-12-29 14:48:00'), 
                    'Before', 
                    ifelse(df$ProcessDate < as.POSIXct('2011-12-29 15:16:00'), 
                           'Between', 
                           'After'))

dplyr

dplyr::case_when is a nice alternative to nested ifelse calls. It evaluates each condition successively and returns the corresponding value:

library(dplyr)

df %>% mutate(
    ProcessDate = as.POSIXct(ProcessDate),    # skip this line if already datetime
                       # if this is true,                      then return "Before"
    Status = case_when(.$ProcessDate < as.POSIXct('2011-12-29 14:48:00') ~ 'Before',
                       # for the rest, if this is true,             return "Between"
                       .$ProcessDate < as.POSIXct('2011-12-29 15:16:00') ~ 'Between',
                       # always true, so make the rest "After"
                       TRUE ~ 'After'))

All versions return the same thing, aside from cut, which returns a factor instead of a character vector.

alistaire
  • 42,459
  • 4
  • 77
  • 117
4

Try this:

left <- as.POSIXct("12/29/2011 14:48", format = "%m/%d/%Y %H:%M") 
right <- as.POSIXct("12/29/2011 15:16", format = "%m/%d/%Y %H:%M") 
DT[, Status := ifelse(ProcessDate < left, "before", 
            ifelse(ProcessDate > right, "after", "between"))]

It gives:

    Id         ProcessDate  Status
 1: 10 2011-12-29 14:14:00  before
 2: 11 2011-12-29 14:16:00  before
 3: 12 2011-12-29 14:14:00  before
 4: 13 2011-12-29 14:20:00  before
 5: 14 2011-12-29 14:49:00 between
 6: 15 2011-12-29 14:51:00 between
 7: 16 2011-12-29 14:53:00 between
 8: 17 2011-12-29 15:11:00 between
 9: 18 2011-12-29 15:13:00 between
10: 19 2011-12-29 15:10:00 between
11: 20 2011-12-29 15:21:00   after
12: 21 2011-12-29 15:34:00   after
13: 22 2011-12-29 15:26:00   after

The same result as above, with vectorizable ifelse() and data.table

Sun Bee
  • 1,595
  • 15
  • 22
0

One of the possible solution is to convert your time into epoch value and then compare. This can be done by using as.integer(as.POSIXct("Time")) as shown below

df = NULL
df$ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18,  19, 20, 21, 22)      
df$date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29      14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
df = as.data.frame(df)
df$date = as.integer(as.POSIXct(df$date))

upper   = as.integer(as.POSIXct('2011-12-29 15:16:00'))
lower   = as.integer(as.POSIXct('2011-12-29 14:48:00'))

You will have the converted date column as below

> df
    ids       date
1   10 1325148240
2   11 1325148360
3   12 1325148240
4   13 1325148600
5   14 1325150340
6   15 1325150460
7   16 1325150580
8   17 1325151660
9   18 1325151780
10  19 1325151600
11  20 1325152260
12  21 1325149440
13  22 1325152560

Then you can simply perform a numeric comparison

for(i in c(1:nrow(df))){
    if(df$date[i] < lower)
            df$Status[i] = "Before"
    else if(df$date[i] > lower & df$date[i] < upper)
            df$Status[i] = "Between"
    else
            df$Status[i] = "After"
}

Resulting in the output

> df
    ids       date  Status
1   10 1325148240  Before
2   11 1325148360  Before
3   12 1325148240  Before
4   13 1325148600  Before
5   14 1325150340 Between
6   15 1325150460 Between
7   16 1325150580 Between
8   17 1325151660 Between
9   18 1325151780 Between
10  19 1325151600 Between
11  20 1325152260   After
12  21 1325149440  Before
13  22 1325152560   After
9Heads
  • 688
  • 3
  • 7
  • To preserve the format of the date you can perform the conversion from date and time to epoch inside the condition for each row. – 9Heads Sep 23 '16 at 05:32
  • 2
    There's not really a point of converting to numeric; comparison operators (`<`, etc.) work on POSIX*t datetimes directly, so all you're doing is making the data harder to read. – alistaire Sep 23 '16 at 14:43
  • As per my previous comment you can preserve the data format. The transformation need not be explicit. The transformation was done to show the concept of epoch while comparing the date. – 9Heads Sep 23 '16 at 15:03