2

I have 9 data sets, each having 115 rows and 742 columns and each data set contains results from a spectrometer taken under specific conditions.

I would like to analyze all combinations of these 9 data sets to determine the best conditions.

Edit:
The data are spectral measurements(rows= samples,columns =wavelengths) taken at 10 different temperatures.

I would like to get all combinations of the 9 data sets and apply a function cpr2 to each combination. cpr2 takes a data set and makes a plsr model,predicts 9 test sets(the individual sets),and returns bias of prediction.

My intention is to find which combination gave the smallest prediction biases i.e how many temperature conditions are need to give acceptable bias.

Based on suggestion:

I'm looking to do something like this

g<-c("g11","g12","g13,g21","g22","g23","g31","g32","g33") 
cbn<-combn(g,3) # making combinations of 3 

comb<-lapply(cbn,cpr2(cbn))

for reference cpr2 is

   cpr2<-function(data){ 
      data.pls<-plsr(protein~.,8,data=data,validation="LOO") #make plsr model       
      gag11p.pred<-predict(data.pls,8,newdata=gag11p)  #predict each test set 
      gag12p.pred<-predict(data.pls,8,newdata=gag12p)
      gag13p.pred<-predict(data.pls,8,newdata=gag13p)
      gag21p.pred<-predict(data.pls,8,newdata=gag21p)
      gag22p.pred<-predict(data.pls,8,newdata=gag22p)            
      gag23p.pred<-predict(data.pls,8,newdata=gag23p)
      gag31p.pred<-predict(data.pls,8,newdata=gag31p)
      gag32p.pred<-predict(data.pls,8,newdata=gag32p)
      gag33p.pred<-predict(data.pls,8,newdata=gag33p)                        
      pred.bias1<-mean(gag11p.pred-gag11p[742])     #calculate prediction bias      
      pred.bias2<-mean(gag12p.pred-gag12p[742])
      pred.bias3<-mean(gag13p.pred-gag13p[742])         
      pred.bias4<-mean(gag21p.pred-gag21p[742])
      pred.bias5<-mean(gag22p.pred-gag22p[742])
      pred.bias6<-mean(gag23p.pred-gag23p[742])
      pred.bias7<-mean(gag31p.pred-gag31p[742])
      pred.bias8<-mean(gag32p.pred-gag32p[742])
      pred.bias9<-mean(gag33p.pred-gag33p[742])            
    r<-signif(c(pred.bias1,pred.bias2,pred.bias3,pred.bias4,pred.bias5,
          pred.bias6,pred.bias7,pred.bias8,pred.bias9),2)            
  out<-c(R2(data.pls,"train",ncomp=8),RMSEP(data.pls,"train",ncomp=8),r)
 return(out)          
}

Any insights into solving this will be appreciated.

DinoSingh
  • 85
  • 2
  • 10
  • To what do the rows and columns of the individual data set refer? Are they experimental results (cols are wavelengths/masses and rows the samples?) and the 10 data sets are ten combinations of settings? If not, what represents the combinations? If the answer to this is that there are 10 * 115 * 750 conditions and you want to assess *all* combinations of them, I hope you are prepared for long wait! – Gavin Simpson Sep 12 '11 at 08:56
  • What do you mean by "all combinations of these datasets"? If each of your data frames has the same column names, you could use `rbind()` to combine them into one data frame: `g <- rbind(g11,g12,g13,g21,g22,g23,g31,g32,g33,g2)` – adamleerich Sep 12 '11 at 08:56
  • You will have to give us more information about your data. I understand you have 10 matrices, but what I don't understand is how you want to combine these. Save for example you combine `g11` and `g12`, what does this combined matrix look like? A single matrix with 230 rows? – Andrie Sep 12 '11 at 08:57
  • @Gavin The data sets are spectral measurements where columns are wavelengths(750) and rows are samples(115). The ten data set refers to ten different temperatures at which the measurements were taken. I would like to assess all combinations of the ten conditions. the column names (wavelengths)are the same for all datasets so the combine g11 and g12 will be as Andrie suggested. – DinoSingh Sep 12 '11 at 10:24
  • @adamleerich `rbind()` would give me a single data set, `however i want to assess how individual conditions interact, eg. g11,g31 and g33 or g11, g21,g22 and g33`. I have been doing the selections manually but I am hoping there is an easier way. – DinoSingh Sep 12 '11 at 10:33
  • Thanks. So next Q. Assess how? I doubt you'll need to combine anything, you probably just need to iterate over the pair-wise combinations. – Gavin Simpson Sep 12 '11 at 10:46
  • @Gavin I want to use each combination as a data set for plsr calibration model to be used to predict the individual conditions and calculate prediction bias. I have a function to do this which is used in one of my previous questions[http://stackoverflow.com/questions/7077209/r-make-pls-calibration-models-from-n-number-of-subset-and-use-them-to-predict-di] – DinoSingh Sep 12 '11 at 10:57
  • This may be relevant for you: http://stackoverflow.com/questions/7382039/chi-square-analysis-using-for-loop-in-r – Chase Sep 12 '11 at 11:52
  • @Chase thank you for the link, I'm trying a similar approach now. – DinoSingh Sep 12 '11 at 13:52
  • So do you want `data` to be a matrix of 3*115 rows and 750 columns produced by combining each of the three matrices involved in a single combination? As I mentioned, the `data` arg to `cpr2` is expecting a **single** data frame-like object. How do you want to combine those three data sets into one? By rows, `?rbind`? – Gavin Simpson Sep 12 '11 at 14:03
  • This is fundamentally flawed: `comb<-lapply(cbn,cpr2(cbn))`! Firstly, `lapply()` doesn't work as you expect on a matrix (`cbn` is a matrix!) - it works on the 360 elements separately, not the 120 three-element vectors (the columns). Secondly, the `FUN` argument should be called using just its name, i.e. `lapply(cbn, cpr2)`. But combined with the first point, you probably want `apply(cbn, 2, cpr2)` – Gavin Simpson Sep 12 '11 at 14:07
  • @Gavin It seems that `rbind` is what i need as I'm looking to combine the data sets by rows. How can I do this for all combinations like in `cbn`, hence making several data sets to be used as `data` for `cpr2`. – DinoSingh Sep 13 '11 at 04:13
  • @DinoSingh I have updated my Answer to show how to `rbind` together the two matrices in my example (you would extend this to three if you want three-at-a-time combinations). – Gavin Simpson Sep 13 '11 at 08:21
  • @Gavin Thank you very much for your assistance. It Works and i was able to extend it. Thank you. – DinoSingh Sep 13 '11 at 11:14

2 Answers2

8

You don't say how you want to assess the pairs of matrices, but if you have your matrices as per the code you showed with those names, then

g <- c("g11", "g12", "g13", "g21", "g22", "g23", "g31", "g32", "g33", "g2")
cmb <- combn(g, 2)

which gives:

> cmb
     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
[1,] "g11" "g11" "g11" "g11" "g11" "g11" "g11" "g11" "g11" "g12" "g12" "g12"
[2,] "g12" "g13" "g21" "g22" "g23" "g31" "g32" "g33" "g2"  "g13" "g21" "g22"
     [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
[1,] "g12" "g12" "g12" "g12" "g12" "g13" "g13" "g13" "g13" "g13" "g13" "g13"
[2,] "g23" "g31" "g32" "g33" "g2"  "g21" "g22" "g23" "g31" "g32" "g33" "g2" 
     [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
[1,] "g21" "g21" "g21" "g21" "g21" "g21" "g22" "g22" "g22" "g22" "g22" "g23"
[2,] "g22" "g23" "g31" "g32" "g33" "g2"  "g23" "g31" "g32" "g33" "g2"  "g31"
     [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45]
[1,] "g23" "g23" "g23" "g31" "g31" "g31" "g32" "g32" "g33"
[2,] "g32" "g33" "g2"  "g32" "g33" "g2"  "g33" "g2"  "g2"

are the set of combinations of your matrices taken 2 at a time.

Then iterate over the columns of cmb doing your assessment, e.g.:

FUN <- function(g, ...) {
    ## get the objects for the current pair
    g1 <- get(g[1])
    g2 <- get(g[2])
    ## bind together
    dat <- rbind(g1, g2)
    ## something here to assess this combination
    cpr2(dat)
}

assess <- apply(cmb, 2, FUN = FUN, ....)
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • @ Gavin Thank you.I was able to reproduce cmb, however for FUN, I want to use a function with the call cpr2(data), if i understand correctly g1 and g2 makes the one combination, to which i apply cpr2. I'm not sure how to get g1 and g2 as data for the function. any advice will be appreciated. – DinoSingh Sep 12 '11 at 12:02
  • No, not really, `g1` and `g2` are for the first column of `cmb` the matrices `g11` and `g12`. That is what `get()` does; it retrieves the object with the given name. `apply()` will apply the function `FUN` to each column of `cmb`. Inside `FUN`, you want to do whatever it is you want to do to assess a *single* combination. `apply()` will ensure that each of the pair-wise (in this case) combinations is considered in turn. My `FUN` just sets up an environment where the two data sets for the current combination are available - where I have `.....` you need your call. – Gavin Simpson Sep 12 '11 at 12:27
  • Further @DinoSingh I don't see how `cpr2()` relations to the all possible combinations bit of this question. `cpr2()` takes a single data object yet here you have at least 2 data sets per combination (more if you want to make combinations of your data sets 3 at a time). Your original Question is full of ambiguity hence my answer will be very general. If you can be more specific (without asking me to read other Qs) I'll try to be more specific in my Answer. – Gavin Simpson Sep 12 '11 at 12:29
  • I apologize for the ambiguity , i have edited my question, I hope it clearer now. Thank you for assistance. – DinoSingh Sep 12 '11 at 13:51
  • @ Gavin Very sorry to bother you again but I encountered an `Error in if (d2 == 0L) { : missing value where TRUE/FALSE needed` when i tried `assess <- apply(cmb, 3, FUN = FUN, ....)`after changing 2 to 3 in `cmb`and adding `get g([3])` and `rbind(g1,g2,g3)in` `FUN`, not sure what I did wrong. – DinoSingh Sep 14 '11 at 10:16
  • @DinoSingh `get g([3])` looks wrong, it should be `get(g[3])`. Comments are not a good place to conduct this sort of follow-up or Q&A. – Gavin Simpson Sep 14 '11 at 12:06
  • @ Gavin Adjusted `get(g[3])` but error still occurs, `traceback`shows it is occurring at `assess <- apply(cmb, 3, FUN = FUN, ....)`. In future do i post as a new question? – DinoSingh Sep 14 '11 at 12:35
  • @DinoSingh It should show more than that. Debug `FUN` via debug(FUN)` and see where it is failing. A question on this would be too localised and most likely be closed as it is unlikely to be useful to people in future. SO isn;t a general help web site. My Answer solves the general problem - you need to figure out what is going wrong in your own code. – Gavin Simpson Sep 14 '11 at 12:48
4

Did you try combn? For example, if you want combinations of 3 drawn from a group of 10 elements you can use combn(10, 3)

  • thank you, i find it works for a simple list but I'm having trouble combining my data sets which i have as different data frames. i tried putting the dataframes in a list like g<-list(data1,data2,data3...) then using combn(length(g)3). I am guessing list is the problem, but atleast i have somewhere to start from now. thanks. – DinoSingh Sep 12 '11 at 06:26