-2

I am trying to calculate the time spent (in minutes) by users for each hour . An example is given below. I want to replicate this in sparklyr as well.

df <- data.frame(user_id = c("x", "y", "z", "a"),
                 start_time = c("2019-04-28 18:30:00", "2019-04-28 21:40:00", "2019-04-28 18:10:00", "2019-04-28 14:10:00"),
                 end_time = c("2019-04-28 20:00:00", "2019-04-28 23:30:00", "2019-04-28 19:30:00", "2019-04-28 18:30:00"))

df$start_time <- as.POSIXct(df$start_time, format = "%Y-%m-%d %H:%M:%S")
df$end_time <- as.POSIXct(df$end_time, format = "%Y-%m-%d %H:%M:%S")

df
#   user_id          start_time            end_time
# 1       x 2019-04-28 18:30:00 2019-04-28 20:00:00
# 2       y 2019-04-28 21:40:00 2019-04-28 23:30:00
# 3       z 2019-04-28 18:10:00 2019-04-28 19:30:00
# 4       a 2019-04-28 14:10:00 2019-04-28 18:30:00

The expected output would be in R & Sparklyr

expected_df<-data.frame(user_id=c("x","y","z","a"),hr_14=c(0,0,0,50),hr_15=c(0,0,0,60),hr_16=c(0,0,0,60),hr_17=c(0,0,0,60),hr_18=c(30,0,50,30),hr_19=c(60,0,30,0),hr_20=c(0,0,0,0),hr_21=c(0,20,0,0),hr_22=c(0,60,0,0),hr_23=c(0,30,0,0))

#   user_id hr_14 hr_15 hr_16 hr_17 hr_18 hr_19 hr_20 hr_21 hr_22 hr_23
# 1       x     0     0     0     0    30    60     0     0     0     0
# 2       y     0     0     0     0     0     0     0    20    60    30
# 3       z     0     0     0     0    50    30     0     0     0     0
# 4       a    50    60    60    60    30     0     0     0     0     0
Yogesh Kumar
  • 609
  • 6
  • 22
  • Could you copy and paste your current and expected output(s)? – NelsonGon May 31 '19 at 10:40
  • @NelsonGon : It is already posted ....df is my data frame , and expected _df is what I want as final output – Yogesh Kumar May 31 '19 at 10:54
  • What I mean is simply copy and paste it. Quicker to figure out(in my opinion) if one sees the data before entering it into their environment. – NelsonGon May 31 '19 at 10:55
  • Is this helpful? https://stackoverflow.com/questions/13649019/split-time-series-data-into-time-intervals-say-an-hour-and-then-plot-the-count – DarrenRhodes May 31 '19 at 11:20

1 Answers1

0

I'm quite sure there is a more elegant way to write this code, but here is a working solution and maybe someone else can build on this:

library(lubridate)
library(dplyr)
library(tidyr)

df %>% 
  mutate(hr1 = paste0("hr_", hour(start_time)), 
         min1 = 60 - minute(start_time), 
         hr2 = paste0("hr_", hour(end_time)),
         min2 = minute(end_time)) %>% 
  gather(hr, hr_val, hr1, hr2) %>% 
  mutate(value = ifelse(hr == "hr1", min1, min2)) %>% 
  select(user_id, hr_val, value) %>% 
  complete(user_id, hr_val = paste0("hr_", 14:23)) %>% 
  group_by(user_id) %>% 
  mutate(value = case_when(!is.na(value) ~ value, 
                           cumsum(!is.na(value)) == 1 ~ 60, 
                           TRUE ~ 0)) %>% 
  spread(hr_val, value)

# A tibble: 4 x 11
# Groups:   user_id [4]
#   user_id hr_14 hr_15 hr_16 hr_17 hr_18 hr_19 hr_20 hr_21 hr_22 hr_23
#   <fct>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a          50    60    60    60    30     0     0     0     0     0
# 2 x           0     0     0     0    30    60     0     0     0     0
# 3 y           0     0     0     0     0     0     0    20    60    30
# 4 z           0     0     0     0    50    30     0     0     0     0
kath
  • 7,624
  • 17
  • 32