0

I have a data frame named "enrolments":

"enrollments" data frame

enrolled_at, unenrolled_at and fully_participated_at are factors. I want to add a new column to my data frame that indicates the differences in hours between two non-empty attributes. The type of this new column is not important, but it must show time in this format (HH MM SS).

I want to do following pseudo code:

If (unenrolled_at == empty && fully_participated_at != empty) 
    newAttributeValue = fully_participated_at - enrolled_at
else if (unenrolled_at != empty && fully_participated_at == empty)
    newAttributeValue = unenrolled_at - enrolled_at
else
    do nothing

Edit: I tried all methods in the site to do this but they does not work. Times stored as factor class in my dataframe but solutions in the site are factor - factor or (String) time - (String) time. I also tried "as.character" and "as.Date" functions respectively. So my question is not duplicate. Rolando Tamayo offers different method to solve my problem but it gives me error: "Error in ymd_hms(comments$unenrolled_at) : could not find function "ymd_hms"" ( I installed lubridate package)

Yunus YILDIRIM
  • 91
  • 1
  • 13
  • 4
    please include your data as editable text, instead of image – Imran Ali Oct 04 '17 at 02:01
  • 3
    Convert to character first with `as.character`, then convert to date format with `as.Date` – acylam Oct 04 '17 at 02:17
  • I tried it before asking the question but it gave this error: Error in charToDate(x) : character string is not in a standard unambiguous format --- Tried command: difftime(as.Date(as.character(enrolments$unenrolled_at)) - as.Date(as.character(enrolments$enrolled_at))) – Yunus YILDIRIM Oct 05 '17 at 23:24

1 Answers1

1

You can use package lubridate:

library(lubridate)


#Create a df with dates

df<-tibble::tibble(
  enrolled_at=as.factor(c("2002-06-09 12:45:40 UTC","2003-01-29 09:30:40 UTC",
                         "2002-09-04 16:45:40 UTC")),
 unenrolled_at=as.factor(c("2002-11-13 20:00:40 UTC",
                        "2002-07-07 17:30:40","2002-07-07 17:30:40 UTC")))
df

# A tibble: 3 x 2
              enrolled_at           unenrolled_at
                   <fctr>                  <fctr>
1 2002-06-09 12:45:40 UTC 2002-11-13 20:00:40 UTC
2 2003-01-29 09:30:40 UTC     2002-07-07 17:30:40
3 2002-09-04 16:45:40 UTC 2002-07-07 17:30:40 UTC

#Check Class
class(df$enrolled_at)

[1] "factor"

#Check class after function ymd_hms
class(ymd_hms(df$enrolled_at))

[1] "POSIXct" "POSIXt"

#Calculete de difference in days
dif<-ymd_hms(df$ unenrolled_at)-ymd_hms(df$enrolled_at)

#difference like a period
as.period(dif)

 [1] "157d 7H 15M 0S"    "-205d -16H 0M 0S"  "-58d -23H -15M 0S"

#Add as a column in df
df$newAttributeValue<-as.period(ymd_hms(df$ unenrolled_at)-ymd_hms(df$enrolled_at))

df

# A tibble: 3 x 3
              enrolled_at           unenrolled_at newAttributeValue
                   <fctr>                  <fctr>      <S4: Period>
1 2002-06-09 12:45:40 UTC 2002-11-13 20:00:40 UTC    157d 7H 15M 0S
2 2003-01-29 09:30:40 UTC     2002-07-07 17:30:40  -205d -16H 0M 0S
3 2002-09-04 16:45:40 UTC 2002-07-07 17:30:40 UTC -58d -23H -15M 0S
Rolando Tamayo
  • 286
  • 2
  • 8