1

This is my first time using elseif. I wish to create a new column mobile$tenuredate (in months) and am trying to find out the issue with my code that produced NA values.

Results

mobile$status == 'active'

rows give NA values for mobile$tenuredate (they should not be NA).

mobile$status == 'stopped'

rows give valid values for mobile$tenuredate.

Below is the code

mobile$tenuredate = if (mobile$status=="stopped") {
  round(difftime(mobile$EFFECTIVEDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else if ((mobile$status == "active") && (mobile$difftemp >= 0)) {
  round(difftime(mobile$CONTRACTENDDATE, mobile$STARTDATE, units="weeks") / 4.348125)
} else {
  round(difftime(mobile$CUTOFFDATE, mobile$STARTDATE, units="weeks") / 4.348125)
}

Data file in CSV available here

Here's a sample dataframe.

structure(list(STARTDATE = structure(c(11413, 11639, 11953, 12212, 
11335, 12050, 12142, 11225, 12176, 11386), class = "Date"), STOPDATE = structure(c(11436, 
12079, NA, 12225, 11345, 12124, 12226, 11999, 12176, 11758), class = "Date"), 
    EFFECTIVEDATE = structure(c(11436, 12079, NA, 12225, 11345, 
    12124, 12226, 11999, 12176, 11758), class = "Date"), CONTRACTENDDATE = structure(c(11778, 
    12004, 12318, 12578, 11700, 12415, 12508, 11977, 12542, 11751
    ), class = "Date"), CUTOFFDATE = structure(c(12273, 12273, 
    12273, 12273, 12273, 12273, 12273, 12273, 12273, 12273), class = "Date"), 
    status = c("stopped", "stopped", "active", "stopped", "stopped", 
    "stopped", "stopped", "stopped", "stopped", "stopped"), tenuredate = structure(c(1, 
    14, NA, 0, 0, 2, 3, 25, 0, 12), class = "difftime", units = "weeks")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Thanks in advance.

Grace
  • 201
  • 2
  • 13
  • Without looking at your data (I'm not following links at the moment), while overly verbose, I see nothing in your code that would introduce `NA`s. It's more likely a problem with how you read in your data, perhaps on your assumptions of the data itself. It would help if you provided a sample of the data here in the question, using `dput(head(mobile))` or `data.frame(...)`. (The reason for `dput` is that it provides unambiguous data as R sees it, not how you assume R is using it.) – r2evans Apr 06 '20 at 18:26
  • Thank you for the suggestion! Have provided the sample data above, using `dput()`. – Grace Apr 06 '20 at 18:40
  • Okay. (1) `if` expects a singleton comparison, you are providing a vector. A vectorized approach is appropriate. (2) `ifelse` is the logical step, of course, but it will fail you because it drops `class`, and you'll lose your `Date` class (though the math will still happen). Standby ... – r2evans Apr 06 '20 at 18:42

1 Answers1

1

if requires its conditional to be of length 1, whereas you are providing a vector. The logical replacement would be to use ifelse, but it is a well-known (among R veterans) problem with ifelse that it will drop the class, so your Date or difftime columns become numeric, and you have to recast them. (This isn't the end of the world, but let's just keep the class as it is for now.)

mobile$tenuredate <- NULL # just to clean up your previous attempt, otherwise not needed
mobile$usedate <- Sys.Date()[NA] # all NAs are not created equal ...
ind <- mobile$status == "stopped"
mobile$usedate[ind] <- mobile$EFFECTIVEDATE[ind]
ind <- (mobile$status == "active") && (mobile$difftemp >= 0)
mobile$usedate[ind] <- mobile$CONTRACTENDDATE[ind]
ind <- is.na(mobile$usedate)
mobile$usedate[ind] <- mobile$CUTOFFDATE[ind]
mobile
# # A tibble: 10 x 7
#    STARTDATE  STOPDATE   EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status  usedate   
#    <date>     <date>     <date>        <date>          <date>     <chr>   <date>    
#  1 2001-04-01 2001-04-24 2001-04-24    2002-04-01      2003-08-09 stopped 2001-04-24
#  2 2001-11-13 2003-01-27 2003-01-27    2002-11-13      2003-08-09 stopped 2003-01-27
#  3 2002-09-23 NA         NA            2003-09-23      2003-08-09 active  2003-08-09
#  4 2003-06-09 2003-06-22 2003-06-22    2004-06-09      2003-08-09 stopped 2003-06-22
#  5 2001-01-13 2001-01-23 2001-01-23    2002-01-13      2003-08-09 stopped 2001-01-23
#  6 2002-12-29 2003-03-13 2003-03-13    2003-12-29      2003-08-09 stopped 2003-03-13
#  7 2003-03-31 2003-06-23 2003-06-23    2004-03-31      2003-08-09 stopped 2003-06-23
#  8 2000-09-25 2002-11-08 2002-11-08    2002-10-17      2003-08-09 stopped 2002-11-08
#  9 2003-05-04 2003-05-04 2003-05-04    2004-05-04      2003-08-09 stopped 2003-05-04
# 10 2001-03-05 2002-03-12 2002-03-12    2002-03-05      2003-08-09 stopped 2002-03-12

It might be useful to pause here and verify that all of the usedate values are from the appropriate columns.

I use usedate as an intermediate value for two reasons: (1) for verification; and (2) because you're doing the same math for the rest of it ... so why keep the same math in three locations, just do it once. There are other ways to do this, of course.

mobile$tenuredate <- round(difftime(mobile$usedate, mobile$STARTDATE, units = "weeks") / 4.348125)
mobile
# # A tibble: 10 x 8
#    STARTDATE  STOPDATE   EFFECTIVEDATE CONTRACTENDDATE CUTOFFDATE status  usedate    tenuredate
#    <date>     <date>     <date>        <date>          <date>     <chr>   <date>     <drtn>    
#  1 2001-04-01 2001-04-24 2001-04-24    2002-04-01      2003-08-09 stopped 2001-04-24  1 weeks  
#  2 2001-11-13 2003-01-27 2003-01-27    2002-11-13      2003-08-09 stopped 2003-01-27 14 weeks  
#  3 2002-09-23 NA         NA            2003-09-23      2003-08-09 active  2003-08-09 11 weeks  
#  4 2003-06-09 2003-06-22 2003-06-22    2004-06-09      2003-08-09 stopped 2003-06-22  0 weeks  
#  5 2001-01-13 2001-01-23 2001-01-23    2002-01-13      2003-08-09 stopped 2001-01-23  0 weeks  
#  6 2002-12-29 2003-03-13 2003-03-13    2003-12-29      2003-08-09 stopped 2003-03-13  2 weeks  
#  7 2003-03-31 2003-06-23 2003-06-23    2004-03-31      2003-08-09 stopped 2003-06-23  3 weeks  
#  8 2000-09-25 2002-11-08 2002-11-08    2002-10-17      2003-08-09 stopped 2002-11-08 25 weeks  
#  9 2003-05-04 2003-05-04 2003-05-04    2004-05-04      2003-08-09 stopped 2003-05-04  0 weeks  
# 10 2001-03-05 2002-03-12 2002-03-12    2002-03-05      2003-08-09 stopped 2002-03-12 12 weeks  

(Once you know you don't need it, mobile$usedate <- NULL.)


If you're using any of the tidyverse packages, this can be done using case_when much more succinctly:

library(dplyr)
as_tibble(mobile) %>%
  mutate(
    usedate = case_when(
      status == "stopped"                     ~ EFFECTIVEDATE,
      (status == "active") && (difftemp >= 0) ~ CONTRACTENDDATE,
      TRUE                                    ~ CUTOFFDATE
    ),
    tenuredate = round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125)
  )

Or a data.table solution:

library(data.table)
as.data.table(mobile)[
  , usedate := Sys.Date()[NA] ][
    status == "stopped", usedate := EFFECTIVEDATE ][
      (status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ][
        is.na(usedate), usedate := CUTOFFDATE ][
          , tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]

If you combine data.table with the pipe of magrittr, then you might find this more readable:

library(data.table)
library(magrittr)
as.data.table(mobile) %>%
  .[ , usedate := Sys.Date()[NA] ] %>%
  .[ status == "stopped", usedate := EFFECTIVEDATE ] %>%
  .[ (status == "active") && (difftemp >= 0), usedate := CONTRACTENDDATE ] %>%
  .[ is.na(usedate), usedate := CUTOFFDATE ] %>%
  .[ , tenuredate := round(difftime(usedate, STARTDATE, units = "weeks") / 4.348125) ]

Refs for my assertion that ifelse drops class:

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Thanks so much! This really helps explain, it solved the problem =) And thank you for providing the various methods including improving my code. – Grace Apr 07 '20 at 16:18