2

I have a data structure problem. Having trouble how to begin. I'm unsure if my keywords make sense in the title.

I've tried the following: Creating a Origin-Destination Table in R is the most help so far. --Doesn't get to the level of indexing I think I need.

Creating origin-destination matrices with R -- simple one step origin destination.

--------------------

My question: How can I create a Origin--Destination data set and have the 'Origin' and 'Destination' ordered by date.

Here is my data set:

Student     Classes       time
John        HomeRoom      8:00
John        Math          9:00
John        English       10:00
John        Physics       11:00
John        Art           1:00
John        Lunch         12:00
Sarah       HomeRoom      8:00
Sarah       English       9:00
Sarah       Art           10:00
Sarah       Physics       12:00
Sarah       Lunch         11:00

This is what I want my data set to look like:

Student  OriginClass time   DestinationClass  timeDest  ClassFlow
John     HomeRoom    8:00   Math              9:00      1
John     Math        9:00   English           10:00     2
John     English     10:00  Physics           11:00     3
John     Physics     11:00  Lunch             12:00     4
John     Lunch       12:00  Art               1:00      5
John     Art         1:00   Home              2:00      6
Sarah    HomeRoom    8:00   English           9:00      1
Sarah    English     9:00   Art               10:00     2
Sarah    Art         10:00  Lunch             11:00     3
Sarah    Lunch       11:00  Physics           12:00     4
Sarah    Physics     12:00  Home              1:00      5

There are two tricks:

  1. Wrapping around the 'destination' to become the 'origin'.
  2. Adding the final 'destination' as home.

I'd figure my next steps are as follows:

  • Index classes by Student based on time as 'ClassIndex'
  • Set My 'Origin' as 0+[previous row ClassIndex] ~ not sure how to do
  • Create a loop, max(ClassIndex) + 1 = 'Home'

The point behind all of this is to beable to show a flow chart.

John: Home Room -> Math -> English -> Physics -> Lunch -> Art -> Home

Sarah: Home Room -> English -> Art -> Lunch -> Physics -> Home

General Grievance
  • 4,555
  • 31
  • 31
  • 45
jpf5046
  • 729
  • 7
  • 32
  • Definitely closer bc `timeDest` is working perfectly. `DestinationClass = lead(Classes, default = "Home"` is giving me an error. Error is `Error: not compatible with requested type` -- I'm thinking it's because `Classes` is a string, what do you think? I could do a lookup table with unique classIDs. – jpf5046 May 03 '17 at 16:50
  • 1
    It is working fine for me though assuming that your 'Classes' is `character` class. BTW, I am using the devel version of `dplyr`. – akrun May 03 '17 at 16:55
  • Had `Classes` as a factor -- works great. Thanks @akrun – jpf5046 May 03 '17 at 18:06

1 Answers1

1

We can use tidyverse

library(dplyr)
df1 %>%
   group_by(Student) %>% 
   mutate(DestinationClass = lead(Classes, default = "Home"),
          timeDest = lead(time), 
          ClassFlow = row_number())
akrun
  • 874,273
  • 37
  • 540
  • 662