Pass dataframe as reference to reduce() in R

Question

I have example machine data:

set.seed(0)
n <- 10
machine_data <- data.frame(c(1:n), sample.int(2, n, replace=TRUE), runif(n, min=50, max=100), runif(n, min=20000, max=100000))
colnames(machine_data) <- c("ID", "Location", "Condition", "ReplaceCost")

I also have a list of potential actions to perform maintenance on the equipment of a particular location:

actions <- c("do_nothing", "repair", "replace")
periods <- 3
perms <- gtools::permutations(n=length(actions), r=periods, v=actions, repeats.allowed = T)
n <- length(unique((machine_data$Location)))
decisions <- do.call(rbind, replicate(n, perms, simplify=FALSE))
df <- data.frame(rep(c(1:n), each=nrow(perms)), decisions)
action_labels <- paste("Period", c(1:periods))
colnames(df) <- c("Location", action_labels)
df$Improvement <- 0
df$Cost <- 0

I also have a model that predicts machine condition:

# for simplicity assume it degrades 10% each period
degrade_machine <- function(currentCondition)
{
  newCondition <- currentCondition * 0.90
  return(newCondition)
}

The idea is to calculate the $Improvement and $Cost of each action set. An action set is defined as the permutation of potential actions across all periods. E.g., "do_nothing," "repair," and "do_nothing" is one action set across the three periods.

I have a function to perform a single period action and give the resulting condition:

perform_action <- function(action, machine_data, location_id, repair_threshold, replace_threshold)
{
  print(action)
  location_data <- machine_data[machine_data$Location == location_id, ]
  # repair decisions and resulting cis
  location_data <- location_data %>%
    # repair results in condition of 95; replace results in 100
    mutate(Condition = ifelse(action == "repair" & Condition <= repair_threshold, 95, 
                             ifelse(action == "replace" & Condition <= replace_threshold, 100, Condition))
    )
  
  machine_data[machine_data$Location == location_id,]$Condition <- location_data$Condition

  # degrade machine for next iteration
  # print the condition after maintenance decision
  print(machine_data$Condition)
  machine_data$Condition <- degrade_machine(machine_data$Condition)
  # print the condition after components degrade
  print(machine_data$Condition)
  return(machine_data)
}

A previous question helped me set this up using the purrr library (Avoiding Loops in R for Accumulating Function Values):

for(i in 1:nrow(df))
{
  action_set <- df[i, 2:4]
  machine_data <- purrr::reduce(action_set %>% as_vector(),
                function(x,y) {
                  x <- perform_action(y, machine_data, 1,
                                          repair_threshold, replace_threshold)
                  return(x)
                }, .init = 0)
}

However, this creates a problem: each recursive call in the above loop starts with the original machine_data values.

For example, suppose I have the action set c('repair', 'do_nothing', 'repair). Machine 2 starts with a $Condition of ~ 60.299. We can see that the second component gets repaired to 95 and then degrades to 85.5 after the first decision to repair. The intent is that the second action (do_nothing) would be passed the updated condition of 85.5, but we see it starts with the original condition value of 60.299 and then degrades to 54.27. I would expect it to instead have a resulting $Condition = 0.9 * 85.5 = 76.95

Starting data:

Data after a repair and do_nothing decision:

I would like this data to be passed as a reference within the reduce() function. Is that possible?

It will be a lot easier for people to help you debug (or for your own debugging) if you pare this down to just what's essential to the specific problem. If the problem is with working with `reduce`, eliminate complications that can come from the for loop, the custom functions, etc. — camille, Jul 24 '23 at 15:51
@camille while I understand that perspective, I'm trying to follow the guidelines to give a fully reproducible problem. Part of the issue with the linked post was that it tried to simplify as you suggested but as a result the answer missed the referencing issue — coolhand, Jul 24 '23 at 15:56

I_O · Answer 1 · 2023-07-24T17:23:51.067

The reducer function in your code:

## purrr::reduce(...
                function(x,y) {
                  x <- perform_action(y, machine_data, 1,
                                          repair_threshold, replace_threshold)
                  return(x)
                }, .init = 0)
## ...
}

... duly operates on y (the "newcomer" of x and y) but fails to heap the result onto stack x. Thus, you reduce your vector "action_set" to the inital x, its first element, discarding the processed y all along. To illustrate: x is the pile, starting with a single piece, while each successive item y is heaped on top, after some processing (if desired). You'll need to process each newcomer y and combine it with the x accrued so far.

compare:

reduce(letters[1:5],
       .f = \(x, y) return(x)
       )
## [1] "a"

with:

reduce(letters[1:5],
       .f = \(x, y) return(paste(x, y))
       )
### [1] "a b c d e"

does pasting pass the value as a reference, or just stack it on the heap? E.g., in the example, would `$Condition` would provide `85.5, 76.95, ...` or `85.5, 60.29, ...`? The first case is passed as a ref, while the second is just stacked without passing as a ref — coolhand, Jul 25 '23 at 13:59
AFAIU it just gets stuck on the heap; see here for a broader discussion: https://stackoverflow.com/questions/2603184/can-you-pass-by-reference-in-r — I_O, Jul 25 '23 at 14:29

Pass dataframe as reference to reduce() in R

1 Answers1