how to do a groupby and count number of occurrences in dplyr

Question

I have following dataframe in R

 truck_no     start_time         end_time           ctr_no    time     type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09      1.67      D
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT34      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT33      1.67      D
  ERT123      21-05-2016 06:53   21-05-2016 08:53   QRT34      2.67      R
  ERT123      21-05-2016 06:53   21-05-2016 08:53   PRT33      2.67      D

Now My desired data frame is

 truck_no     start_time         end_time           ctr_no   time    type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09    1.67    2D2R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12    2.67    1R1D

I want to count D's and R's and paste it in above manner time is taken as a average. How can I do it in dplyr ?

https://stackoverflow.com/questions/22767893/find-number-of-rows-using-dplyr-group-by — gd047, Sep 08 '17 at 12:14
How do you get `ctr_no`? From my understanding it makes sense to be `ERT09` and `QRT34`. Also `truck_no` shouldn't be `ABC123` and `ERT123`? — Sotos, Sep 08 '17 at 12:16
@docendodiscimus I think your answer is correct. OP messed up the expected output IMO — Sotos, Sep 08 '17 at 12:58
@Sotos, I undeleted my answer for now. Let's see if OP will respond to clarify at some point. — talat, Sep 08 '17 at 13:07

talat · Accepted Answer · 2017-09-08T13:06:56.767

Here's a dplyr approach:

foo <- function(x) {y <- table(x); paste(rbind(y, names(y)), collapse = "")}

df %>% 
  group_by(truck_no) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)

## A tibble: 2 x 5
#  truck_no       start_time         end_time ctr_no  type
#    <fctr>           <fctr>           <fctr> <fctr> <chr>
#1   ABC123 20-05-2016 06:53 20-05-2016 08:53  ERT09  2D2R
#2   ERT123 21-05-2016 06:53 21-05-2016 08:53  QRT34  1D1R

In case you want the start and end time to be the mean per truck_no you could use the following extension:

df %>% 
  group_by(truck_no) %>% 
  mutate_at(vars(ends_with("_time")), 
            ~mean(as.POSIXct(as.character(.), format="%d-%m-%Y %H:%M"))) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)
## A tibble: 2 x 5
#  truck_no          start_time            end_time ctr_no  type
#    <fctr>              <dttm>              <dttm> <fctr> <chr>
#1   ABC123 2016-05-20 06:53:00 2016-05-20 08:23:00  ERT09  2D2R
#2   ERT123 2016-05-21 06:53:00 2016-05-21 08:53:00  QRT34  1D1R

@Lyngbakr, you are right. The expected output seems a little unclear at the moment. — talat, Sep 08 '17 at 12:31

how to do a groupby and count number of occurrences in dplyr

1 Answers1