0

I have following dataframe in R

 truck_no     start_time         end_time           ctr_no    time     type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09      1.67      D
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT34      1.67      R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT33      1.67      D
  ERT123      21-05-2016 06:53   21-05-2016 08:53   QRT34      2.67      R
  ERT123      21-05-2016 06:53   21-05-2016 08:53   PRT33      2.67      D

Now My desired data frame is

 truck_no     start_time         end_time           ctr_no   time    type
  ABC123      20-05-2016 06:53   20-05-2016 08:53   ERT09    1.67    2D2R
  ABC123      20-05-2016 06:53   20-05-2016 08:53   TRT12    2.67    1R1D

I want to count D's and R's and paste it in above manner time is taken as a average. How can I do it in dplyr ?

talat
  • 68,970
  • 21
  • 126
  • 157
Neil
  • 7,937
  • 22
  • 87
  • 145
  • https://stackoverflow.com/questions/22767893/find-number-of-rows-using-dplyr-group-by – gd047 Sep 08 '17 at 12:14
  • 2
    How do you get `ctr_no`? From my understanding it makes sense to be `ERT09` and `QRT34`. Also `truck_no` shouldn't be `ABC123` and `ERT123`? – Sotos Sep 08 '17 at 12:16
  • What does "time is taken as a average" mean? – talat Sep 08 '17 at 12:20
  • 1
    @docendodiscimus I think your answer is correct. OP messed up the expected output IMO – Sotos Sep 08 '17 at 12:58
  • @Sotos, I undeleted my answer for now. Let's see if OP will respond to clarify at some point. – talat Sep 08 '17 at 13:07

1 Answers1

4

Here's a dplyr approach:

foo <- function(x) {y <- table(x); paste(rbind(y, names(y)), collapse = "")}

df %>% 
  group_by(truck_no) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)

## A tibble: 2 x 5
#  truck_no       start_time         end_time ctr_no  type
#    <fctr>           <fctr>           <fctr> <fctr> <chr>
#1   ABC123 20-05-2016 06:53 20-05-2016 08:53  ERT09  2D2R
#2   ERT123 21-05-2016 06:53 21-05-2016 08:53  QRT34  1D1R

In case you want the start and end time to be the mean per truck_no you could use the following extension:

df %>% 
  group_by(truck_no) %>% 
  mutate_at(vars(ends_with("_time")), 
            ~mean(as.POSIXct(as.character(.), format="%d-%m-%Y %H:%M"))) %>% 
  mutate(type = foo(type)) %>% 
  summarise_all(first)
## A tibble: 2 x 5
#  truck_no          start_time            end_time ctr_no  type
#    <fctr>              <dttm>              <dttm> <fctr> <chr>
#1   ABC123 2016-05-20 06:53:00 2016-05-20 08:23:00  ERT09  2D2R
#2   ERT123 2016-05-21 06:53:00 2016-05-21 08:53:00  QRT34  1D1R
talat
  • 68,970
  • 21
  • 126
  • 157