Chaining Along Data Frames in a list

Question

I have a list of data.frames which hold the data for each of the stages of a chemical process. Each of the data.frames has the same number of columns in the same order but the number of rows can vary for each of the data.frames.

See below the example data with the difference that fruits are standing in for chemical substances and reagents.

I've written a function to scale up the raw data and add the data to columns in the original data frames.

I have two problems, when a I apply a scale factor it only applies to the last element of the last data.frame. The new scale factor is then applied to the whole of the last data.frame. I can generate the scale factor for the next but last data frame by taking the weight of the common fruits (chemicals) between the two data frames (always the in the last and first rows) and dividing the wts in a similar manner to how we got the first scale factor ... then multiplying throughout this data.frame and repeating to get to the first data.frame. The other problem is ... when a use lapply to apply the scale_up function over the list, how can I feed it these scale factors so that each one is only applied to its particular data frame.

example.data <- list(
  stage1 <- data.frame(code=c("aaa", "ooo", "bbb"),
                       stuff=c("Apples","Oranges","Bananas"),
                       Mw=c(1,2,3),
                       Density=c(3,2,1),
                       Assay=c(8,9,1),
                       Wt=c(1,2,3), stringsAsFactors = FALSE),
  stage2 <- data.frame(code=c("bbb","mmm","ccc","qqq","ggg"),
                       stuff=c("Bananas","Mango","Cherry","Quince","Gooseberry"),
                       Mw=c(8,9,10,1,2),
                       Density=c(23,32,55,5,4),
                       Assay=c(0.1,0.3,0.4,0.4,0.9),
                       Wt=c(45,23,56,99,2), stringsAsFactors = FALSE),
  stage3 <- data.frame(code=c("ggg","bbb","ggg","bbb"),
                       stuff=c("Gooseberry","Bread","Grapes","Butter"),
                       Mw=c(9,8,9,10),
                       Density=c(34,45,67,88),
                       Assay=c(10,10,46,52),
                       Wt=c(24,56,31,84), stringsAsFactors = FALSE)
)

scale_up <- function(inventory,scale_factor,vessel_volume_L, NoBatches = 1) {
  ## This function accepts a data.frame with Molecule, Mw, Density,
  ## Assay and Wt columns
  ## It takes a scale factor and vessel volume and returns input
  ## charges and fill volumes

  ## rownames(inventory) <- inventory$smiles
  inventory <- inventory[,-1] ## the rownames are given the smiles designation
  ## and the smiles column is removed

  ## volumes and moles are calculated for the given data

  inventory$Vol <- round((inventory$Wt / inventory$Density) , 3)
  inventory$Moles <- round((inventory$Wt / inventory$Mw) , 3)
  inventory$Equivs <- round((inventory$Moles / inventory$Moles[1]) , 3)

  inventory[,paste0(scale_factor,"xWt_kg")] <-  round((((inventory$Wt * scale_factor) / 1000 ) / NoBatches) , 3)
  inventory[,paste(scale_factor,"xVol_L",sep="")] <-  round((((inventory$Vol * scale_factor) / 1000 ) / NoBatches) , 3)

  inventory$PerCentFill <- round((100 * cumsum(inventory[,paste(scale_factor,"xVol_L",sep="")]) / vessel_volume_L) , 2)

  inventory
  ## at which point everything is in place to scale up

}

new.example.data  <- lapply(example.data, scale_up,20e3,454)

> new.example.data[[1]]
    stuff Mw Density Assay Wt   Vol Moles Equivs 20000xWt_kg 20000xVol_L PerCentFill
1  Apples  1       3     8  1 0.333     1      1          20        6.66        1.47
2 Oranges  2       2     9  2 1.000     1      1          40       20.00        5.87
3 Bananas  3       1     1  3 3.000     1      1          60       60.00       19.09

So, I've scaled my original data (laboratory scale, grams) to see if it will fit in a ten gallon plant vessel (454 L) but the only stage that is scaled properly is the last one ... the other two need those 'fiddle factors' and I need to apply the 'fiddle factors' to each of the stages as I loop (presumably a for loop rather than lapply) through the list.

(Ps ... I tried to ask this earlier but I tried to disguise my example too much and just confused the stack overflowers).

How about utilizing data.table and rdbindList (better performance) — amonk, May 19 '17 at 10:35
@agerom I would prefer a base R solution (simply because I haven't mastered data.table) but if anyone has a data.table solution it would be welcome. — DarrenRhodes, May 19 '17 at 10:46
@user1945827 if I understood it correctly, you want to pass different set of scaling factors for 3 different data.frames...right? — tushaR, May 19 '17 at 10:50
@Tushar yes. But I also want to determine what the other two are ... the first one is given by the user. In this case 20e3 and the other two are related. I made a stab at doing it in the first attempt at asking this question ... see here https://stackoverflow.com/questions/44034738/chaining-dataframes-in-a-list but I got stuck. (Bear in mind I plan to delete the contents of the link over the next day or so ...). — DarrenRhodes, May 19 '17 at 11:32
@user1945827 if the weights could be pre-calculated then you can use `?mapply` to pass the distinct scaling factor for each stage. For your scenario, since you want to pass it down the latest calculated weight you might want to declare a global variable and update with latest scaling factor inside the scale_up function. This global variable should be in the lapply call then. — tushaR, May 19 '17 at 12:13
Side note: `list(a <- 1)` is not the same as `list(a = 1)` which is probably what you meant — Aurèle, May 19 '17 at 13:19
@Aurèle could you elaborate, please? Where did I use list(a <- 1) ??? — DarrenRhodes, May 19 '17 at 19:30
@Tushar that looks really useful (I think it answers one of my problems) ... could you provide an example, please? If you look at my previous attempt at asking this question http://stackoverflow.com/questions/10689055/create-an-empty-data-frame I think I was wondering toward that sort of solution. In the data.frame details I was wondering how to apply details[,4] to each of the data.frames ... Could I use mapply? — DarrenRhodes, May 19 '17 at 20:10
The proper way of specifying arguments in a function call is with `=`, not `<-`. Using the assignment arrow actually causes the assignment to happen in the calling environment, and the arguments to be passed unnamed — Aurèle, May 22 '17 at 16:41

tushaR · Accepted Answer · 2017-05-20T18:56:41.253

1

Based on the details mentioned in this post and the other link Chaining dataframes in a list here's the solution that I have come up with:

Extract the weights for the first and last fruit in a matrix like this:

wts<-sapply(example.data,function(t){c(t$Wt[1],t$Wt[nrow(t)])},simplify=T)

Declare a global variable final.wt as you have initially taken:

final.wt<<- 20000

Create a scales function to caclulate the scaling factor for each corresponding stage:

scales<-function(x,final.wt){
n=ncol(x)
nscales<-numeric(n)
for(i in (n:1)){
  if(i==n){
  .GlobalEnv$final.wt = final.wt/x[2,i]
   nscales[i]=.GlobalEnv$final.wt
}else{
  .GlobalEnv$final.wt = .GlobalEnv$final.wt * x[1,i+1]/(x[2,i])
   nscales[i]=.GlobalEnv$final.wt
}
}
return(nscales)
}

This gives you a vector of the desired scaling factors for each stage:

scale.fact<-scales(wts,final.wt)

Now you can call scale_up using mapply like this:

mapply(scale_up,example.data,scale.fact,454)

The values in scale.fact are:

42858.0 2857.2 238.1

Each value will be passed to scale_factor using mapply corresponding to the stage .

edited May 20 '17 at 18:56

answered May 20 '17 at 04:49

tushaR

3,083
1
20
33

The mapply function didn't appear to work as you describe but everything else did. – DarrenRhodes May 21 '17 at 11:07
what issue did you face? did you get an error or the output is not what you expected? – tushaR May 21 '17 at 14:11
It looks as though I need, SIMPLIFY = FALSE as part of the mapply function. I tried to reproduce my result which didn't make sense. So, I dput() it to post here but it was larger than it appeared. So I looked at the result with simplify=false and got the expected result. You'll see what I mean if you compare the two results, with the different forms of mapply. (Thanks again!!). – DarrenRhodes May 22 '17 at 18:44
1

More of me not understanding the format without SIMPLIFY = FALSE ... but it's all singing and dancing now. – DarrenRhodes May 23 '17 at 07:42

Chaining Along Data Frames in a list

1 Answers1