0

I am trying to replicate something like this with a custom function but I am getting errors. I have the following data frame

> dd
   datetimeofdeath injurydatetime
1                   2/10/05 17:30
2                   2/13/05 19:15
3                    2/15/05 1:10
4    2/24/05 21:00  2/16/05 20:36
5                    3/11/05 0:45
6                   3/19/05 23:05
7                   3/19/05 23:13
8                   3/23/05 20:51
9                   3/31/05 11:30
10                    4/9/05 3:07

The typeof these is integer but for some reason they have levels as if they were factors. This could be the root of my problem but I am not sure.

> typeof(dd$datetimeofdeath)
[1] "integer"
> typeof(dd$injurydatetime)
[1] "integer"
> dd$injurydatetime
 [1] 2/10/05 17:30 2/13/05 19:15 2/15/05 1:10  2/16/05 20:36 3/11/05 0:45  3/19/05 23:05 3/19/05 23:13 3/23/05 20:51 3/31/05 11:30
[10] 4/9/05 3:07  
549 Levels:  1/1/07 18:52 1/1/07 20:51 1/1/08 17:55 1/1/11 15:25 1/1/12 0:22 1/1/12 22:58 1/11/06 23:50 1/11/07 6:26 ... 9/9/10 8:15

Now I would like to apply the following function rowwise()

library(lubridate)
library(dplyr)
get_time_alive = function(datetimeofdeath, injurydatetime)
{
  if(as.character(datetimeofdeath) == "" | as.character(injurydatetime) == "") return(NA)

  time_of_death = parse_date_time(as.character(datetimeofdeath), "%m/%d/%y %H:%M")
  time_of_injury = parse_date_time(as.character(injurydatetime), "%m/%d/%y %H:%M")

  time_alive = as.duration(new_interval(time_of_injury,time_of_death))
  time_alive_hours = as.numeric(time_alive) / (60*60)

  return(time_alive_hours)
}

This works on individual rows, but not when I do the operation rowwise.

> get_time_alive(dd$datetimeofdeath[1], dd$injurydatetime[1])
[1] NA
> get_time_alive(dd$datetimeofdeath[4], dd$injurydatetime[4])
[1] 192.4
> dd = dd %>% rowwise() %>% dplyr::mutate(time_alive_hours=get_time_alive(datetimeofdeath, injurydatetime))
There were 20 warnings (use warnings() to see them)
> dd
Source: local data frame [10 x 3]
Groups: 

   datetimeofdeath injurydatetime time_alive_hours
1                   2/10/05 17:30               NA
2                   2/13/05 19:15               NA
3                    2/15/05 1:10               NA
4    2/24/05 21:00  2/16/05 20:36               NA
5                    3/11/05 0:45               NA
6                   3/19/05 23:05               NA
7                   3/19/05 23:13               NA
8                   3/23/05 20:51               NA
9                   3/31/05 11:30               NA
10                    4/9/05 3:07               NA

As you can see the fourth element is NA even though when I applied my custom function to it by itself I got 192.4. Why is my custom function failing here?

Community
  • 1
  • 1
user52291
  • 161
  • 2
  • 2
  • 4

1 Answers1

0

I think you can simplify your code a lot and just use something like this:

dd %>% 
  mutate_each(funs(as.POSIXct(as.character(.), format = "%m/%d/%y %H:%M"))) %>% 
  mutate(time_alive = datetimeofdeath - injurydatetime)
#      datetimeofdeath      injurydatetime    time_alive
#1                <NA> 2005-02-15 01:10:00       NA days
#2 2005-02-24 21:00:00 2005-02-16 20:36:00 8.016667 days
#3                <NA> 2005-03-11 00:45:00       NA days

Side notes:

  • I shortened your input data, because it's not easy to copy (I only took those three rows that you also see in my answer)
  • If you want the "time_alive" formatted in hours, just use mutate(time_alive = (datetimeofdeath - injurydatetime)*24) in the last mutate.
  • If you use this code, there's no need for rowwise() - which should also make it faster, I guess
talat
  • 68,970
  • 21
  • 126
  • 157