0

I Need to convert this table, creating Cab.ID subsets according to Date, Direction and Route.

Date          Direction Cab.ID  Route
Sep 24, 2018    Logout  x-1      R1
Sep 24, 2018    Logout  x-2      R1
Sep 24, 2018    Logout  x-1      R2
Sep 24, 2018    Login   x-3      R1
Sep 25, 2018    Login   y-1      R3
Sep 25, 2018    Logout  z-1      R4
Sep 25, 2018    Logout  z-1      R4
Sep 25, 2018    Logout  x-4      R5
Sep 25, 2018    Login   x-4      R5
Sep 26, 2018    Login   x-3      R6
Sep 26, 2018    Login   x-5      R6

Required Table

Date         Route    Login-Cabid   Logout-Cabid
Sep 24, 2018    R1      x-3           x-1,x-2
Sep 24, 2018    R2                    x-1
Sep 25, 2018    R3      y-1 
Sep 25, 2018    R4                    z-1
Sep 25, 2018    R5      x-4           x-4
Sep 26, 2018    R6      x-3,x-5 

Thanks

markus
  • 25,843
  • 5
  • 39
  • 58
Xenus
  • 43
  • 4

2 Answers2

2

In base R we can use aggregate and reshape

df2 <- aggregate(Cab.ID ~ Date + Direction + Route, unique(df1), toString)

reshape(df2, idvar = c("Date", "Route"), timevar = "Direction", direction = "wide")
#          Date Route Cab.ID.Login Cab.ID.Logout
#1 Sep 24, 2018    R1          x-3      x-1, x-2
#3 Sep 24, 2018    R2         <NA>           x-1
#4 Sep 25, 2018    R3          y-1          <NA>
#5 Sep 25, 2018    R4         <NA>           z-1
#6 Sep 25, 2018    R5          x-4           x-4
#8 Sep 26, 2018    R6     x-3, x-5          <NA>

In case you want to use tidyverse or data.table, here is how

library(dplyr)
library(tidyr)
df1 %>% 
  unique() %>% 
  group_by(Date, Route, Direction) %>% 
  summarise(Cab.ID = toString(Cab.ID)) %>% 
  spread(Direction, Cab.ID)

Or

library(data.table)
setDT(unique(df1))[, .(Cab.ID = toString(Cab.ID)), by = .(Date, Route, Direction)
           ][, dcast(.SD, Date + Route ~ Direction, value.var = 'Cab.ID')]

data

df1 <- structure(list(Date = c("Sep 24, 2018", "Sep 24, 2018", "Sep 24, 2018", 
"Sep 24, 2018", "Sep 25, 2018", "Sep 25, 2018", "Sep 25, 2018", 
"Sep 25, 2018", "Sep 25, 2018", "Sep 26, 2018", "Sep 26, 2018"
), Direction = c("Logout", "Logout", "Logout", "Login", "Login", 
"Logout", "Logout", "Logout", "Login", "Login", "Login"), Cab.ID = c("x-1", 
"x-2", "x-1", "x-3", "y-1", "z-1", "z-1", "x-4", "x-4", "x-3", 
"x-5"), Route = c("R1", "R1", "R2", "R1", "R3", "R4", "R4", "R5", 
"R5", "R6", "R6")), .Names = c("Date", "Direction", "Cab.ID", 
"Route"), class = "data.frame", row.names = c(NA, -11L))
markus
  • 25,843
  • 5
  • 39
  • 58
-1

agreed with markus, and you can use spread {tidyr} after df2 <- aggregate(Cab.ID ~ Date + Direction + Route, df1, toString)

spread(df2, key = Direction, value = Cab.ID)

Xi wa
  • 154
  • 4