Manipulating Data.Frames

Question

I have different data.frame objects with two columns. These data.frame objects are called Experiment1, Experiment2, Experiment3 ... Experiment{n}

> Experiment1 Name Statistic 1 a -1.050 2 b 0.058 3 c 0.489 4 d 1.153 5 e 0.736 6 f -1.155 7 g 0.186

> Experiment2 Name Statistic 1 a 0.266 2 b 0.067 3 c -0.385 4 d 0.068 5 e 1.563 6 f 0.745 7 g 1.671

> Experiment3 Name Statistic 1 a 0.004 2 b -2.074 3 c 0.746 4 d 0.207 5 e 0.700 6 f 0.158 7 g 0.067

> Experiment4 Name Statistic 1 a 0.255 2 b -0.542 3 c 0.477 4 d 1.552 5 e 0.025 6 f 1.027 7 g 0.326

> Experiment5 Name Statistic 1 a 1.817 2 b 0.147 3 c 0.052 4 d 0.194 5 e -0.137 6 f 2.321 7 g -0.939

> Experiment6 Name Statistic 1 a 1.817 2 b 0.147 3 c 0.052 4 d 0.194 5 e -0.137 6 f 2.321 7 g -0.939

> ExperimentalDesign$metabolite [1] "butyrate" "h2s" "hippurate" "acetate" "propionate" "butyrate_2" [7] "h2s_2" "hippurate_2" "acetate_2" "propionate_2"

I have different data.frame objects with three columns. These data.frame objects are called Experiment1, Experiment2, Experiment3 ... Experiment{n} (where n is NumberTubes divided by NumberParameters).

Now I want to merge from each data.frame object the .$Statistic column in a table (3 statistic columns per output..) tab_1 <- cbind(Experiment1, Experiment2$Statistic, Experiment3$Statistic). Also, take the metabolite from ExperimentalDesign$metabolite in order. e.g. Table_3 would get hippurate.

NumberRepeats <- 3 (Table_1 = merge Experiment_1, Experiment_2$Statistic, Experiment_3$Statistic , Table_2 = merge Experiment_4, Experiment_5$Statistic, Experiment_6$Statistic, etc.)
Experiment_n <- 17 (e.g. Experiment_1, Experiment_2, etc..)
skipTube <- c(11) (skip Experiment_11)

Desired outputs:

Table_1: Experiment1 Experiment2 Experiment3 metabolite a -1.050 0.266 0.004 butyrate b 0.058 0.067 -2.074 butyrate c 0.489 -0.385 0.746 butyrate d 1.153 0.068 0.207 butyrate e 0.736 1.563 0.700 butyrate f -1.155 0.745 0.158 butyrate g 0.186 1.671 0.067 butyrate

Table_2

Experiment4 Experiment5 Experiment6 metabolite a 0.255 1.817 -0.827 h2s b -0.542 0.147 0.219 h2s c 0.477 0.052 1.561 h2s d 1.552 0.194 1.493 h2s e 0.025 -0.137 0.063 h2s f 1.027 2.321 0.844 h2s g 0.326 -0.939 -0.373 h2s

TRIED SO FAR:

With this you merge on column of different dataframe objects to one table. You can control the number of column by the NumberRepeats variable. All table which are stored in a list have same number of data columns like the
NumberRepeats variable except the last table...

# created test data
for(i in 1:17){
  Name <- letters[1:7]
  Statistic <- round(rnorm(7), 3)
  assign(paste0("Experiment",i), data.frame(Name, Statistic))
}    

# set some parameters
NumberRepeats <- 3
Experiment_n <- 17
skipTube <- c(11)

# lets go

out <- list()
list_index <- 1
counter <- 1
while(counter < Experiment_n) {

  tab <- NULL
  nam <- NULL
  while((is.null(tab) || ncol(tab) < NumberRepeats) & Experiment_n >= counter){
    if(!any(counter == skipTube)){
      tab <- cbind(tab, get(paste0("Experiment", counter))$Statistic)
      # tab <- as.data.frame(tab)
      nam <- c(nam,paste0("Experiment", counter))
    }
    counter <- counter + 1  
  }
  colnames(tab) <- nam
  rownames(tab) <- as.matrix(Experiment1$Name)

  out[[list_index]] <- tab
  assign(paste0('table_', list_index), tab)

  list_index <- list_index + 1  
}
out

Output from above code:

Experiment1 Experiment2 Experiment3 a 0.136 0.260 -1.089 b 0.946 -1.165 -0.599 c -0.462 -1.445 0.044 d -1.936 -0.391 0.622 e 0.537 -0.502 1.192 f 0.259 0.096 -1.873 g 1.352 0.049 -0.644

Desired output from the above code:

Experiment1 Experiment2 Experiment3 metabolite a -1.050 0.266 0.004 butyrate b 0.058 0.067 -2.074 butyrate c 0.489 -0.385 0.746 butyrate d 1.153 0.068 0.207 butyrate e 0.736 1.563 0.700 butyrate f -1.155 0.745 0.158 butyrate g 0.186 1.671 0.067 butyrate

It is helpful if you can provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), with both a sample of your data and code for what what you've tried so far. — austensen, Sep 13 '17 at 15:21
You said three columns but only show two, what else is there? You referenced `NumberParameters` but never demonstrate its use; is this supposed to be `NumberRepeats`? I strongly recommend for using a [list of dataframes](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) instead of accessing individual frames with `assign`, it is usually much easier, more efficient, and much more robust. — r2evans, Sep 13 '17 at 15:30
What if the number of experiments are not perfectly divisible by 3? — tushaR, Sep 13 '17 at 15:52

tushaR · Answer 1 · 2017-09-13T17:08:01.907

Something like this should work but this also quite manual:

table1 = Reduce(function(x,y){cbind(x,y)},
list(Experiment1$Statistic,Experiment2$Statistic,
Experiment3$Statistic,ExperimentalDesign$metabolite[1]))

table2 = Reduce(function(x,y){cbind(x,y)},
list(Experiment4$Statistic,Experiment5$Statistic,
Experiment6$Statistic,ExperimentalDesign$metabolite[2]))

EDIT: A more robust solution:

First create a list of all the experiment data.frames named ldf:

ldf = list(Experiment1,Experiment2,Experiment3,...,Experimentn)

And then:

lapply(1:ceiling(length(ldf)/3),
   function(t,l,df){ 
     if(t==ceiling(length(l)/3)){
       ind = ((3*t)-2):(3*t-(length(l)%%3))
     }else{
       ind = ((3*t)-2):(3*t)
     };
    cbind(Reduce(function(x,y){cbind(x,y)},lapply(l[ind],'[[','Statistic')),
    df$metabolite[t])
    },
ldf,ExperimentalDesign)

score 0 · Answer 2 · answered Sep 13 '17 at 15:47

This solution should do what you want in case you want to aggregate every 3 tables.

library(reshape)

for(i in 1:17){
  Name <- letters[1:7]
  Statistic <- round(rnorm(7), 3)
  ExperimentName <- rep(paste0("Experiment",i), 7)
  assign(paste0("Experiment",i), data.frame(ExperimentName, Name, Statistic, stringsAsFactors = FALSE) )
}    

# set some parameters
NumberRepeats <- 5
Experiment_n <- 17
skipTube <- c(3,7,11)

# Create dummy list for the metabolites
metabolites <- c("met1", "met2", "met3", "met4", "met5")

for (iteration in c(1:Experiment_n)){
  if (iteration %% 3 == 0){
    temp_df <- rbind(get(paste0("Experiment", iteration - 2)), get(paste0("Experiment", iteration - 1)), get(paste0("Experiment", iteration)))
    print(temp_df)
    temp_df <- melt(data = temp_df)
    aggregates <- dcast(data = temp_df, formula = Name ~ ExperimentName, value.var = "value")
    aggregates$metabolite <- metabolites[iteration/3]
    print(aggregates)
  }
}

Manipulating Data.Frames

2 Answers2