2

I am posting this question as an extension to my previous question--How to subset data based on predefined pattern in R?.@ycw helped to solve the previous question. However, I have some more concerns to resolve. I need to aggregate the following data( this data the data of the part of solution in my previous question with an additional column).

MoveID  Travel_Time Person  Day.ID  Place   Location_Code   Conc_Code   Mode
NA          NA      5       1       0       H                   1       NA
0           0.5     5       1       1       C                   0       D
1           0.5     5       1       2       D                   0       D
2           0.5     5       1       3       W                   1       D
3           1       5       1       4       E                   0       T
4           1       5       1       5       W                   1       T
5           0.25    5       1       6       H                   1       D
6           0.5     5       1       7       F                   0       P
7           1       5       1       8       H                   1       P

I want to summarize the data based on Conc_Code in following way: Subset the data, start from first 1 to second 1-add Travel time for that subset and concatenate Mode (leaving the first cell in the subset). Then subset the data from second 1 to third 1-- add all cells in travel time in the subset, and concatenate cell for modes in the same subset leaving the first cell in the subset, so on for the third subset. The final data I am looking for looks like the following:

Person      Day     Journey Type        Mode/s  Travel Time(hr)
5           1       H-C-D-W             DDD     1.5
5           1       W-E-W               TT      2
5           1       W-H                 D       0.25
5           1       H-F-H               PP      1.5

I would appreciate if some could help me proper coding that I can use to get my result. Also, it is important for me to be able to replicate the result when the pattern of 1 and 0 in Conc_Code is different than the one in the sample data. Thanks!

009
  • 35
  • 3
  • I can't make any sense of your desired result. The travel times don't seem to add up and the concatenated location codes seem to not match either. E.g. `W-H-D` is never a continuous `Location_Code` sequence in the original data. – thelatemail Jul 02 '17 at 23:01
  • @thelatemail, I have edited the data so that I format is not confusing. – 009 Jul 02 '17 at 23:14
  • All good - just wasn't sure if I was misunderstanding the question. Makes sense to me now. – thelatemail Jul 02 '17 at 23:17

1 Answers1

0

Here is a solution using tidyverse and data.table. dt5 is the final output.

# Load packages
library(tidyverse)
library(data.table)

# Create example data frame
dt <- read.table(text = "MoveID  Travel_Time Person  Day.ID  Place   Location_Code   Conc_Code   Mode
NA          NA      5       1       0       H                   1       NA
0           0.5     5       1       1       C                   0       D
1           0.5     5       1       2       D                   0       D
2           0.5     5       1       3       W                   1       D
3           1       5       1       4       E                   0       T
4           1       5       1       5       W                   1       T
5           0.25    5       1       6       H                   1       D
6           0.5     5       1       7       F                   0       P
7           1       5       1       8       H                   1       P",
                 header = TRUE, stringsAsFactors = FALSE)

Step 1: Duplicate rows when Conc_Code == 1

After this step, 2 in Con_Code3 means start, while 1 means end.

dt2 <- dt %>%
  mutate(Conc_Code2 = ifelse(Conc_Code == 1, 2, 0)) %>%
  mutate(Conc_Code3 = map2(Conc_Code, Conc_Code2, `:`)) %>%
  unnest() %>%
  select(-Conc_Code, -Conc_Code2)

Step 2: Create Run Lenght ID to identify events

Run2 indicates the event ID

dt3 <- dt2 %>%
  mutate(Conc_Code4 = ifelse(Conc_Code3 == 0, 2, Conc_Code3)) %>%
  mutate(Run1 = rleid(Conc_Code4)) %>%
  mutate(Run2 = ifelse(Run1 %% 2 == 1, Run1 - 1, Run1))

Step 3: Summarise data based on event ID

dt4 <- dt3 %>%
  group_by(Person, Day.ID, Run2) %>%
  summarise(`Journey Type` = paste(Location_Code, collapse = "-"),
            `Mode/s` = paste(Mode[2:n()], collapse = ""),
            `Travel Time(hr)` = sum(Travel_Time[2:n()])) %>%
  ungroup()

Step 4: Remove first and last rows, Final processing

dt5 <- dt4 %>%
  group_by(Person, Day.ID) %>%
  slice(2:(n() - 1)) %>%
  select(-Run2) %>%
  rename(Day = Day.ID)
www
  • 38,575
  • 12
  • 48
  • 84