0

Here is a dataframe I have [![enter image description here](https://i.stack.imgur.com/8NL9F.png)](https://i.stack.imgur.com/8NL9F.png)

I made a simple example, but the logic is here : for a given product_code I can have various final destinations and various steps (can be Plant to final_destination or Plant to step1 to step_x... to final_destination)

site <- c("DC_Frankfurt","F6_DC_Bordeaux","B3_Paris","BEAG_Toronto","DC_Frankfurt","Final_dest1","Final2","Final3")
product_code <- c("000001","000001","000001","000001","000002","000001","000001","000001")
transfersite <- c("Plant1","DC_Frankfurt","DC_Frankfurt","DC_Frankfurt","Plant2","B3_Paris","BEAG_Toronto","F6_DC_Bordeaux")

df <- data.frame(transfersite, product_code,site)

This is what I expect :

enter image description here


product_code <- c("000001","000001","000001","000002")
step1 <- c("Plant1","Plant1","Plant1","Plant2")
step2 <- c("DC_Frankfurt","DC_Frankfurt","DC_Frankfurt","DC_Frankfurt")
step3 <- c("F6_DC_Bordeaux","B3_Paris","BEAG_Toronto",NA)
step4 <- c("Final3","Final_dest1","Final2",NA)

result_expected <- data.frame(product_code,step1,step2,step3,step4)

I tried this so far, works well but if there are more than 4 steps I am dead, and if there are not, the code starts to loop on the final steps... Plus, with that, I don't know how to merge on the same row, it does not match what I expect yet.

my_test <- df %>% 
  filter(str_detect(transfersite,"Plant" )) %>%
  mutate(step1 = transfersite,
         step2 = site) %>%
  full_join(df)

my_test <- my_test %>%
  semi_join(my_test, by = c("product_code" = "product_code", "transfersite" = "step2")) %>%
  mutate(step3 = site) %>%
  full_join(my_test)

my_test <- my_test %>%
  semi_join(my_test, by = c("product_code" = "product_code", "transfersite" = "step3")) %>%
  mutate(step4 = site) %>%
  full_join(my_test)

Thank you everyone.

Blayke12
  • 13
  • 3

2 Answers2

0

Something like this, perhaps?

i <- 2; var <- paste0("step", i)
dfnew <- rename(df, step1 = transfersite, step2 = site)
while (i < 22 && any(!is.na(dfnew[[ var ]]))) {
  prevvar <- var
  i <- i + 1; var <- paste0("step", i)
  dfnew <- left_join(dfnew, rename(df, !!prevvar := transfersite, !!var := site),
                     by = c("product_code", prevvar))
}
dfnew %>%
  mutate(NAs = rowSums(is.na(cur_data()))) %>%
  group_by(product_code) %>%
  filter(NAs == min(NAs)) %>%
  ungroup() %>%
  select(product_code, everything(), -!!var, -NAs)
# # A tibble: 4 × 5
#   product_code step1  step2        step3          step4      
#   <chr>        <chr>  <chr>        <chr>          <chr>      
# 1 000001       Plant1 DC_Frankfurt F6_DC_Bordeaux Final3     
# 2 000001       Plant1 DC_Frankfurt B3_Paris       Final_dest1
# 3 000001       Plant1 DC_Frankfurt BEAG_Toronto   Final2     
# 4 000002       Plant2 DC_Frankfurt NA             NA         

I added i < 22 just as a guard against infinite loops. These can occur when there are circular paths in the graph. "22" is arbitrary, if you expect all of your real paths will be fewer than (say) 10, that's also a good number.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Somehow in my use case the while loop is infinite, I don't really understand how your code is working though it is pretty advanced ahah. Thank you very much btw, i'm impressed by the speed you came up with a solution – Blayke12 Feb 13 '23 at 16:36
  • It assumes no circular paths. – r2evans Feb 13 '23 at 16:39
  • Hmmmm this means I need to specify the right number of steps, right ? Because I don't know it for each product_code. One product_code can have 2 steps, the other 4 or 6... i know it's a mess – Blayke12 Feb 13 '23 at 16:43
  • It shouldn't be difficult to include a limit, perhaps `while (i < 15 && any(!...))`? – r2evans Feb 13 '23 at 16:44
  • ... but "infinite loop" is certainly the risk of unbounded `while` loops; I didn't put effort into guarding against it, I should have ... edited. – r2evans Feb 13 '23 at 16:50
  • thank you again. Somehow in my use-case the steps stop at 2.. i don't know why... :( i'll check it again tomorrow... – Blayke12 Feb 13 '23 at 17:02
0

Here's a recursive function that will add steps until no more steps are possible:

library(dplyr)   # >= v1.1.0
library(stringr)

route_steps <- function(data, step = 1, max_steps = Inf) {
  step_name <- paste0("step", step)
  if (step == 1) {
    out <- data %>% 
      filter(str_detect(transfersite, "Plant")) %>%
      rename(!!step_name := transfersite)
  } else {
    keys <- c("product_code", "transfersite")
    names(keys) <- c("product_code", step_name)
    out <- data %>% 
      rename(!!step_name := site) %>%
      left_join(df, by = keys, multiple = "all")
  }
  if (all(is.na(out$site)) | step == max_steps) mutate(out, site = NULL)
  else route_steps(out, step = step + 1, max_steps = max_steps)
}

route_steps(df)

result
#    step1 product_code        step2          step3       step4
# 1 Plant1       000001 DC_Frankfurt F6_DC_Bordeaux      Final3
# 2 Plant1       000001 DC_Frankfurt       B3_Paris Final_dest1
# 3 Plant1       000001 DC_Frankfurt   BEAG_Toronto      Final2
# 4 Plant2       000002 DC_Frankfurt           <NA>        <NA>

Things could get wonky, e.g., if circular routes are possible. As a fallback, you can try setting the max_steps argument, which may or may not help -- I haven't tested with circular routes.

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • I have a question though. I don't understand this line : rename(!!step_name := site). Why rename(step_name = site) does not work ? Why rename("step{step}" := site) works ? I couldn't find any answer about those operators := and !! – Blayke12 Feb 14 '23 at 09:40
  • @Blayke12 see [this post](https://stackoverflow.com/a/26003971/17303805), including the section "dplyr version >= 0.7." – zephryl Feb 14 '23 at 12:49