-1

I have a dataset that includes timestamped sales information for a specific day, for multiple retailers and product type. Also, it is a multiproduct vending machine, and the transaction can be initialized a number of ways.

Here is a small sample of how it is organzied:

retailer    session    eventtype    timestamp
123         1          1            5:15:09
123         1          3            5:15:13
123         2          1            5:16:21
123         2          3            5:16:33
123         3          2            5:16:59
123         3          1            5:17:04
123         3          6            5:17:22
555         1          2            6:45:32
555         1          3            6:45:46
555         2          1            6:56:02
555         2          3            7:02:21
929         4          1            3:21:23
929         4          6            3:22:09

So, retailer 123 had 3 sessions, with a few different events within each session (money being inserted, sales being made, vouchers being redeemed, etc).

What I'd like to know is what is the mean time between sessions overall for all retailers. So, when a session changes to a new number, under the same retailer, how many seconds/minutes passed in between?

Also, I'd like to be able to play around with it- what was the stdev? Is there a difference in times between sessions depending if the session started with a 1 or 3 type event? What if I just want to completely ignore sessions that start with event type 2?

Sorry for the detailed request, but this is beyond my R capabilities and I can't seem to find good documentation on it (mainly because I'm probably not looking in the right places).

Thanks!

datahappy
  • 826
  • 2
  • 11
  • 29
  • 1
    See the introduction to R. http://cran.r-project.org/manuals.html See also this thread (http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to learn how to ask good questions with a small, reproducible example. – Roman Luštrik Feb 03 '14 at 18:52
  • Thank you for the tip. I'm still learning the proper way to ask questions on here. I'll work on refining the example given. – datahappy Feb 03 '14 at 19:16

3 Answers3

1

Here is a base R alternative.

# convert timestamp to as.POSIXct object
df$time <- as.POSIXct(paste(Sys.Date(), df$timestamp))

# create an index of session change for each retailer
df$change <- with(df, ave(session, retailer, FUN = function(x){
  c(0, diff(x))
}))

# for each retailer, calculate time difference between first time of new session
# and time of the previous row
diffs <- by(data = df, df$retailer, function(x){
  difftime(x$time[x$id == 1], x$time[which(x$id == 1) - 1], units = "secs")
})

diffs
# df$retailer: 123
# Time differences in secs
# [1] 68 26
# attr(,"tzone")
# [1] ""
# --------------------------- 
#   df$retailer: 555
# Time difference of 616 secs
# ---------------------------
#   df$retailer: 929
# Time difference of  secs


# overall mean of time differences between sessions 
mean(unlist(diffs))
# [1] 236.6667
Henrik
  • 65,555
  • 14
  • 143
  • 159
0

I find the question are interesting here (business side) but unfortunately , You are asking a lot without providing what you have tried or at least the expected output.

You can start like this for example:

library(data.table)
DT <- as.data.table(dat)
## coerce to ITime
DT[,ntime := lapply(paste0(0,timestamp),as.ITime)]
## I compute the duration of each session by retailer
## then I compute the mean by retailer
DT[,tail(ntime,1)[[1]]-head(ntime,1)[[1]],"retailer,session"][
  ,list(meansessionTime=mean(V1)),retailer]

 retailer meansessionTime
1:      123            13.0
2:      555           196.5
3:      929            46.0

EDIT The OP is looking for time between sessions not mean time of sessions:

DT[, ntime := as.POSIXct(paste(Sys.Date(), timestamp))]
DT[,btw:= {xx=c(0,diff(ntime))
          ifelse(c(0,diff(session))==1,xx,0)},retailer]

mean(DT[btw!=0,btw])
[1] 236.6667
agstudy
  • 119,832
  • 17
  • 199
  • 261
0

Here is an answer that calculates the times between sessions. This should cover the possibility that session is not continuous:

df$timestamp <- as.POSIXct(df$timestamp, format="%H:%M:%s")
dt.max <- data.table(df)[, max(timestamp), by=list(retailer, session)]
dt.min <- data.table(df)[, min(timestamp), by=list(retailer, session)]
dt.min[, session:=session - 1]
dt.max <- dt.max[dt.max[, session != max(session), by=list(retailer)]$V1]
setkey(dt.max, retailer, session)
dt.deltas <- dt.max[dt.min, roll=T]
dt.deltas[complete.cases(dt.deltas), list(avg=mean(V1.1 - V1), stdv=sd(V1.1 - V1)), by=retailer]
BrodieG
  • 51,669
  • 9
  • 93
  • 146