I have taken a function from this post to create a random time efficiently generate a random sample of times and dates between two dates.
Here is my data set:
latemail <- function(N, st="2012/01/01", et="2012/12/31") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
rt <- st + ev
}
set.seed(42);
a<- print(latemail(9))
a<- sort(a)
data <- data.frame(time= a, place=c("Start", "B", "O", "A", "A", "Start", "A", "O", "A"), ID=c(rep(1, 5), rep(2,4)))
The data looks like this:
time place ID
1 2012-02-19 04:40:45 Start 1
2 2012-04-14 12:34:56 B 1
3 2012-07-08 13:16:49 O 1
4 2012-08-22 07:41:26 A 1
5 2012-08-27 21:15:08 A 1
6 2012-09-14 10:22:03 Start 2
7 2012-09-25 22:30:49 B 2
8 2012-10-30 03:43:16 B 2
9 2012-11-29 22:42:03 O 2
I would like to take the time difference when place is "O" and "start", within each group (ID).
Questions:
1) Is the structure of the above data in accordance with tidy data? Because I think it makes more sense to spread the data so one can take difftime column wise. If each ID has only one row will it be tidy data (to separate e.g. between the A's one could call them A_1, A_2 if they have to be columns). But which format is tidy data.
2) Is there a better way to do accomplish this than bellow?
data2 <- data %>%
filter(place %in% c("Start", "O")) %>%
group_by(ID) %>%
mutate(diff=difftime(lead(time), time, units="days")) %>%
filter(!is.na(diff))
Output:
# A tibble: 2 x 4
# Groups: ID [2]
time place ID diff
<dttm> <fct> <dbl> <time>
1 2012-02-19 04:40:45 Start 1 140.31671 days
2 2012-09-25 22:30:49 Start 2 " 65.04947 days"