-1

I have this dataframe dput given below :

lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 5L, 5L, 6L, 
6L, 7L), userId = c(1, 1, 1, 2, 2, 4, 4, 5, 5, 5), datetime = 
structure(c(1457029336, 
1457029337, 1457029340, 1457029596, 1457313569, 1457030783, 1457030784, 
1457030918, 1457030920, 1457370365), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), referer = c(22, 2, 7, 5, 23, 20, 7, 24, 18, 
22), request = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 5)), .Names = c("session_id", 
"userId", "datetime", "referer", "request"), row.names = c(NA, 
10L), class = "data.frame")

Now i wanted to drop out those sessions having a minimum specified criteria/value. I try this code :

lf3 %>% group_by(session_id) %>% tally(sort = TRUE) %>% filter(n>2)

But I want to return it the same dataframe with only sessions pass this condition, like below:

  session_id userId            datetime referer request
1          1      1 2016-03-03 18:22:16      22       1
2          1      1 2016-03-03 18:22:17       2       2
3          1      1 2016-03-03 18:22:20       7       3

How to go with that

Psidom
  • 209,562
  • 33
  • 339
  • 356
SumitArya
  • 111
  • 5
  • Update your question with the expected output. – Ronak Shah Oct 10 '17 at 02:29
  • So it will give only session_id =1 rows whose frequency is greater than 2.Desired output would be like this frame :`structure(list(session_id = c(1L, 1L, 1L), userId = c(1, 1, 1 ), datetime = structure(c(1457029336, 1457029337, 1457029340), class = c("POSIXct", "POSIXt"), tzone = "UTC"), referer = c(22, 2, 7), request = c(1, 2, 3)), .Names = c("session_id", "userId", "datetime", "referer", "request"), row.names = c(NA, 3L), class = "data.frame")` – SumitArya Oct 10 '17 at 02:34
  • I would prefer base R, `ave`, `lf3[ave(lf3$userId, lf3$session_id, FUN = length) > 2, ]` – Ronak Shah Oct 10 '17 at 02:39

2 Answers2

4

You might need group_by %>% filter:

lf3 %>% group_by(session_id) %>% filter(n() > 2)

# A tibble: 3 x 5
# Groups:   session_id [1]
#  session_id userId            datetime referer request
#       <int>  <dbl>              <dttm>   <dbl>   <dbl>
#1          1      1 2016-03-03 18:22:16      22       1
#2          1      1 2016-03-03 18:22:17       2       2
#3          1      1 2016-03-03 18:22:20       7       3
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    ok thats working .its give tibble so i converted it to a dataframe and save it another dataframe variable name.Thanks.Performance should be check while using this approach. – SumitArya Oct 10 '17 at 02:45
0

We can use data.table

library(data.table)
setDT(lf3)[, if(.N >2) .SD, session_id]
#      session_id userId            datetime referer request
#1:          1      1 2016-03-03 18:22:16      22       1
#2:          1      1 2016-03-03 18:22:17       2       2
#3:          1      1 2016-03-03 18:22:20       7       3
akrun
  • 874,273
  • 37
  • 540
  • 662