0

I have 2 data.frames

> head(cont)
                    old_pert     cmap_name       conc   perturb_geo        t1        t2        t3        t4        t5
1 5202764005789148112904.A02     estradiol 0.00000001 GSM119257 GSM119218 GSM119219 GSM119221 GSM119222 GSM119223
2 5202764005789148112904.A01 valproic acid 0.00050000 GSM119256 GSM119218 GSM119219 GSM119221 GSM119222 GSM119223

> head(expression)[1:3,1:8]
          GSM118911 GSM118912 GSM118913 GSM118723 GSM118724 GSM118725 GSM118726 GSM118727
1007_s_at     387.6     393.2     290.5     378.6     507.8     383.7     288.8     451.9
1053_at        56.4      53.5      32.8      39.0      71.5      47.3      46.0      50.1
117_at          6.3      33.6      19.2      17.6      20.3      15.0       7.1      43.1

I want to apply a loop to do:

for(i in 1:nrow(cont)){

first take some values from cont which will be used ahead

vehicle <- cont[i, 5:9]
perturb <- cont[i, 4]
col_name <- paste(cont[i, 2], cont[i, 3], sep = '_') #estradiol_.00001
tmp <- sum(expression[,which(colnames(expression) == vehicle)])/5
tmp2 <- expression[,which(colnames(expression) == perturb)]
tmp3 <- tmp/tmp2
div <- cbind(div, tmp3)
colnames(div)[i + 1] <- col_name
}

Take those columns from expression where col.names == vehicle & perturb and apply division.

div <- expression$vehicle / expression$perturb #I'm not getting how I can pass here the value in `vehicle` and `perturb`

Assign this new variable a column name which should be a combination of drug_name and concentration

col.names(div) <- drug_name_concentration

assign it the row.names of expression:

row.names(div) <- row.names(expression)

So this process will iterate 271 times (nrow(cont) = 271) and every time a new divised column will be cbindto my previous div. Hence final outcome will be:

                arachidonic acid_0.000010     oligomycin_0.000001 .........
1007_s_at            0.45                      0.30
1053_at              1.34                      0.65
117_at               0.11                      0.67
.....
.....

The logic is clear in my head but I am not getting how I can do it. Thanks for your help.

user3253470
  • 191
  • 1
  • 4
  • 11

1 Answers1

1

You are not assigning the variables correctly in the loop. Below is a sample loop that will correctly go over each row assigning the variable. e.g. the first loop i == 1, note I have changed how the column name is generated.

for(i in 1:nrow(cont)){
       vehicle <- cont[i, 3]
       perturb <- cont[i, 4]
       col_name <- paste(cont[i, 5], cont[i, 6], sep = '_')
    }

To then search for the respective columns with these variable names you can then use:

df[,which(colnames(df) == x)]

approach where df is you data frame and x is the variable.

Therefore,

div <- data.frame(row.names(expression))
for(i in 1:nrow(cont)){
       vehicle <- cont[i, 3]
       perturb <- cont[i, 4]
       col_name <- paste(cont[i, 5], cont[i, 6], sep = '_')

       tmp <- expression[,which(colnames(expression) == vehicle)]/
                    expression[,which(colnames(expression) == perturb)]

       div <- cbind(div, tmp)

       colnames(div)[i + 1] <- col_name
    }

    div <- div[,-1]
    row.names(div) <- row.names(expression)

What is happening is it loops through each row, assigns the value to the variables before finding those columns and simply dividing by the resulting vectors.

It then binds by column to the div data frame created before the loop with the row names from table expression.

Finally, renames the column name and after completing the loop it then renames the row names and drops the first column with the now redundant values.

EDIT - question changed

change #1

vehicle <- cont[i, 5:9]

to

vehicle <- cont[i, c(5:9)] ## note c()

change #2

tmp <- sum(expression[,which(colnames(expression) == vehicle)])/5

to

tmp <- sum(expression[,which(colnames(expression) %in% vehicle)])/5

FINAL EDIT

Full working function:

for(i in 1:nrow(cont)){

  perturb <- cont[i, 4]
  col_name <- paste(cont[i, 2], cont[i, 3], sep = '_')
  vehicle <- cont[i, c(5:9)]
  vehicle <- unname(unlist(vehicle[1,]))
  tmp <- expression[,which(colnames(expression) %in% vehicle)]
  row_tots <- as.data.frame(rowSums(tmp))
  row_tots <- row_tots/5

  tmp <- row_tots/expression[,which(colnames(expression) == perturb)]
  div <- cbind(div, tmp)
  colnames(div)[i + 1] <- col_name
}
div <- div[,-1]
row.names(div) <- row.names(expression)
amwill04
  • 1,330
  • 1
  • 11
  • 18
  • Thanks a bundle. It worked.. I was wondering how this thing is working: In some cases the `col_name <- paste(cont[i, 5], cont[i, 6], sep = '_')` had the same name for 2 instances and this code handled it by giving names "metformin_0.00001" and "metformin_0.00001.1". Can you explain why and how it happened? – user3253470 Oct 12 '15 at 14:22
  • You could try creating an empty vector with `col_names <- c()` and then within the loop `col_names <- c(col_names, paste(cont[i, 5], cont[i, 6], sep = '_'))` obviously remove the other instance of `col_names` in the loop. and then after the loop and after the `div <- div[,-1]` assign the column names via `colnames(div) <- col_names` – amwill04 Oct 12 '15 at 14:43
  • Ok, thanks. Can you tell me what will be the possible solution for the situation where `perturb` contains more than 1 columns and I want to take `perturb = sum of columns / no.of columns` and then divide `control / perturb` – user3253470 Oct 12 '15 at 15:09
  • Depends whether or not you know how many columns it is going to be. if that varies then you are going to probably want to write a function the deal with that. The function above deals with a known number of columns. That detail aside, in the `df[,which(colnames(df) == x)]` you can use the OR operator `|` so that it becomes `df[,which(colnames(df) == x | colnames(df) == y)]`, you could even wrap that in the `sum()/nrow()` functions to get the value out. However that will give you a single value, which I'm guessing is the point as you want the mean. – amwill04 Oct 12 '15 at 15:27
  • Now in every case I have to take 5 columns for vehicle (that I'm doing by: `vehicle <- cont[i, 5:9]`), sum their values and divide them by 5: It will be the `vehicle` (that I'm doing by: `tmp <- sum(expression[,which(colnames(expression) == vehicle)])/5`) but it is not working. @amwill04 – user3253470 Oct 13 '15 at 11:43
  • I've modified my data.frame `cont`, so that you can see the change, plus I have modified the code according to my results but it's not working (I've also posted that in question, for your better understanding). Thanks – user3253470 Oct 13 '15 at 11:53
  • You need to create a vector of columns, so `vehicle <- cont[i, 5:9])` should be `vehicle <- cont[i, c(5:9)])`, note the c() in the column selection. Following that the simplest way will be to change your ``which()` statement from `==` to `%in%`. This is due to `vehicle <- cont[i, c(5:9)])` will give a vector of characters, i.e. and the statement will select all columns where the column name appears "IN" that vector, i.e. `[1] "GSM118912" "GSM118911"`. Therefor it will appear as: `vehicle <- cont[i, c(5:9)])` `tmp <- sum(expression[,which(colnames(expression) %in% vehicle)])/5)` – amwill04 Oct 13 '15 at 11:57
  • `tmp <- sum(expression[,which(colnames(expression) %in% vehicle)])/5` I am trying this line (please note last `)` is removed) but it is giving the following error: Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables – user3253470 Oct 13 '15 at 12:06
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/92138/discussion-between-amwill04-and-user3253470). – amwill04 Oct 13 '15 at 12:09
  • Can you help me with this question: [http://stackoverflow.com/questions/35484595/data-frame-merge-and-selection-of-values-which-are-common-in-2-data-frames] – user3253470 Feb 18 '16 at 16:12