0

I have this dataset:

> dput(SampleEvents)
structure(list(Event = structure(c(10L, 5L, 6L, 11L, 10L, 7L, 
11L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 11L), .Label = c("e10", "e11", 
"e12", "e13", "e2", "e3", "e6", "e8", "e9", "Login", "Logout"
), class = "factor"), Transaction.ID = structure(c(NA, 1L, NA, 
2L, NA, NA, NA, NA, 3L, NA, NA, NA, NA, NA, NA), .Label = c("t1", 
"t4", "t5"), class = "factor"), User.ID = structure(c(1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("kenn1", 
"kenn2"), class = "factor"), Event.Date = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "20/5/2017", class = "factor"), 
    Event.Time = structure(c(12L, 13L, 14L, 15L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("10:01", "10:02", 
    "10:03", "10:04", "10:05", "10:06", "10:07", "10:08", "10:09", 
    "10:10", "10:11", "9:00", "9:30", "9:45", "9:50"), class = "factor")), .Names = c("Event", 
"Transaction.ID", "User.ID", "Event.Date", "Event.Time"), class = "data.frame", row.names = c(NA, 
-15L))

enter image description here

I would like to remove all rows within two fixed values under the column "Event", i.e. rows from "Login" to "Logout", which have all missing Transaction ID values between "Login" and "Logout":

enter image description here

I would also like to retain the current order of the dataset.

How do I do this in R?

Kenneth Singh
  • 335
  • 1
  • 3
  • 15
  • Please do not provide your data as an image. For anyone to use it they would need to type it all in again. Instead, please use a text format. Ideally, you would have the data in R and use `dput` to provide the data structure. Your image looks like it is from Excel. _At least_ you could save as a csv and paste that into your question. – G5W May 25 '17 at 19:36
  • I am a bit new to this site. Let me try attaching the dataset. – Kenneth Singh May 25 '17 at 19:37
  • Yes, I am trying to have a sample here and understand the logic so that I can apply it in my parent dataset. – Kenneth Singh May 25 '17 at 19:43

1 Answers1

0

You could do something like the following, which works with the data you supplied...

library(dplyr)

#add variables to mark login-logout blocks and number them
df <- df %>% mutate(session=cumsum(Event=="Login")-cumsum(Event=="Logout"),
                    block=c(0,cumsum(diff(session)!=0)),
                    block=ifelse(Event=="Logout",block-1,block))

#identify blocks to remove
df2 <- df %>% group_by(block) %>% 
              summarise(Login=first(session)>0,
                        noTrans=all(is.na(Transaction.ID))) %>% 
              filter(Login & noTrans)

#remove unwanted blocks and delete temporary variables
df <- df %>% filter(!(block %in% df2$block)) %>% 
             select(-c(session,block))

df
   Event Transaction.ID User.ID Event.Date Event.Time
1  Login           <NA>   kenn1  20/5/2017       9:00
2     e2             t1   kenn1  20/5/2017       9:30
3     e3           <NA>   kenn1  20/5/2017       9:45
4 Logout             t4   kenn1  20/5/2017       9:50
5     e8           <NA>   kenn2  20/5/2017      10:04
6     e9             t5   kenn2  20/5/2017      10:05
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32