1

I have an initial value as well as three datasets in a list:

  value <- 10
  df1 <- data.frame(id = c("a", "b", "c"), quantity = c(3, 7, -2))
  df2 <- data.frame(id = c("d", "e", "f"), quantity = c(0, -3, 9))
  df3 <- data.frame(id = c("g", "h", "i"), quantity = c(-2, 0, 4))
  df_list <- list(df1, df2, df3)

I would like to apply multiple functions to each data frame in the list, dynamically update the value at the end of the operations, and then run the same procedure on the next list item. The challenge is that the functions themselves take value as an input.

Here is how I would accomplish this without using the apply function:


  # Example function 1: Generate `outcome` by multiplying quantity times a random number
  df_list[[1]]$outcome <- df_list[[1]]$quantity * sample(1:10, 1)

  # Example function 2: Multiply `value` by `quantity` to update `outcome` 
  df_list[[1]]$outcome <- value * df_list[[1]]$quantity
  
  # Updates `value` by adding the old `value` to the sum of the outcome column:
  value <- value + as.numeric(colSums(df_list[[1]]["outcome"]))

  # Repeats operations for df_list[[2]] and df_list[[3]]
  df_list[[2]]$outcome <- df_list[[2]]$quantity * sample(1:10, 1)
  df_list[[2]]$outcome <- value * df_list[[2]]$quantity
  value <- value + as.numeric(colSums(df_list[[2]]["outcome"]))
  
  df_list[[3]]$outcome <- df_list[[3]]$quantity * sample(1:10, 1)
  df_list[[3]]$outcome <- value * df_list[[3]]$quantity
  value <- value + as.numeric(colSums(df_list[[3]]["outcome"]))

I can use lapply to run the functions on each list item, but how do I access (and dynamically update) the non-list object value before proceeding to the next list item?

hoho
  • 107
  • 7
  • Isn't this `df_list[[1]]$outcome <- value * df_list[[1]]$quantity` and `df_list[[1]]$outcome <- df_list[[1]]$quantity * sample(1:10, 1)`completely replaceing the 'outcome' i.e. if the second step is done without considering the value of outcome, why the first step was assigned – akrun Sep 25 '21 at 00:02
  • Yes—good point. Both functions are examples, not the actual (more complex) functions I'm interested in, and the random number product was the first thing I thought of, although of course you're right that it makes the first function pointless. I've edited to reverse the order of the functions (which I admit only marginally makes them more meaningful) and noted that they are examples only. – hoho Sep 25 '21 at 00:11
  • Ok, if that is the case the `for` loop in the solution is the easiest and flexible one compared to a `Reduce` based approach – akrun Sep 25 '21 at 00:12

1 Answers1

1

if we need to update, use a for loop i.e loop over the sequence of list and change the index

for(i in seq_along(df_list)) {
     # Multiplies `value` by `quantity` to obtain `outcome` for each row in df_list[[1]]
       df_list[[i]]$outcome <- value * df_list[[i]]$quantity

       # Updates `outcome` by multiplying by a random number
       df_list[[i]]$outcome <- df_list[[i]]$quantity * sample(1:10, 1)
       value <- value + as.numeric(colSums(df_list[[i]]["outcome"]))
       }

-output

> value
[1] 84
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, this works great. Out of curiosity, is there a way to do this with `lapply`? I'm wondering if accessing and updating non-list objects is possible inside `lapply`. – hoho Sep 25 '21 at 00:15
  • 1
    @hoho You may use `<<-` for updating the value in global, but it is still not a good option. Another way is with `Reduce`, but when you have multiple objects going into the function, it is not that flexible and have difficulty in understanding – akrun Sep 25 '21 at 00:17
  • 1
    @hoho you may also check [here](https://stackoverflow.com/questions/45933130/apply-vs-for-loop-in-r) about the difference in iterative vs independent evaluation – akrun Sep 25 '21 at 00:23