-1

I have started to learn the the R very recently so forgive me if it's a novice question for someone. I want to extract the row from column "Bladder" which value is more than 5 times higher in compare to other column.

gene     Adrenal    Amygdala    Bladder BoneMarrow
1007_s_at   10.46973369 11.26483864 100.43303178    9.907426976
1053_at 6.446570421 6.462840464 6.570665594 7.068326351
117_at  8.018137441 7.738652705 7.604989675 8.38937883
121_at  10.78168853 10.3223056  10.38043102 10.73936285
1255_g_at   5.625038847 6.132930765 5.526885199 5.448521716
1294_at 8.37142904  8.1019947   8.549260758 8.697436419
1316_at 6.237386633 6.429011484 6.083330287 6.295933456
1320_at 6.206410651 6.139873183 6.328348899 6.251521738
1405_i_at   6.588370219 5.949622255 7.420451672 8.823058974

Expected result

gene     Adrenal    Amygdala    Bladder BoneMarrow
1007_s_at   10.46973369 11.26483864 100.43303178    9.907426976

I got this answer useful but I don't know how to apply for multiple columns select only rows if its value in a particular column is less than its value in the other column

Thanks.

Community
  • 1
  • 1
pali
  • 195
  • 2
  • 14
  • Given the data frame above, can you post the expected result? – thepule Sep 02 '16 at 20:24
  • "more than 5 times higher in compare to other column". That statement is quite ambiguous. 5x higher than _any_ other column or than _all_ other columns? – Axeman Sep 02 '16 at 20:27
  • `dplyr::filter(your_data, Blatter > 5*(column_you_are_talking_about))`. If you have multiple conditions just add multiple arguments to filter to include those conditions. e.g., . `Blatter > 5*other_column_you_are_talking_about` – shayaa Sep 02 '16 at 20:30
  • 1
    As is, your posted data has no occurrence of examples for which Bladder is greater than 5 times any or all the other columns. Please fix that and then post expected results. Also, you may want to name that first column. – aichao Sep 02 '16 at 20:31
  • Thanks everyone. I just updated the question – pali Sep 02 '16 at 20:38
  • @Axeman I mean 5x higher than all other columns. Thanks – pali Sep 02 '16 at 20:40
  • 1
    @pali: sorry, but do you mean to change that 100.46973369 value to Bladder instead of Adrenal? Otherwise, your condition is to return rows in which any column other than Bladder is greater than 5 * Bladder. – aichao Sep 02 '16 at 20:52
  • @aichao Thanks for catching it. Fixed – pali Sep 03 '16 at 12:56

2 Answers2

2

You want to perform a subset of your data based on your condition. Here, I assume your data is in a data frame named df:

df[df$Bladder > apply(5 * subset(df, select=-c(gene, Bladder)), 1, max),]

This will select the rows of df for which the Bladder column is more than 5 times the max of the other columns. We select all columns other than Bladder and gene using the subset command, and we compute the row-wise max using apply with the MARGIN set to 1 (i.e., the first margin or rows).

Using the updated data in your post, we get:

##       gene  Adrenal Amygdala  Bladder BoneMarrow
##1 1007_s_at 10.43303 11.26484 100.4697   9.907427

The data is:

df <- structure(list(gene = structure(1:9, .Label = c("1007_s_at", 
"1053_at", "117_at", "121_at", "1255_g_at", "1294_at", "1316_at", 
"1320_at", "1405_i_at"), class = "factor"), Adrenal = c(10.43303178, 
6.446570421, 8.018137441, 10.78168853, 5.625038847, 8.37142904, 
6.237386633, 6.206410651, 6.588370219), Amygdala = c(11.26483864, 
6.462840464, 7.738652705, 10.3223056, 6.132930765, 8.1019947, 
6.429011484, 6.139873183, 5.949622255), Bladder = c(100.46973369, 
6.570665594, 7.604989675, 10.38043102, 5.526885199, 8.549260758, 
6.083330287, 6.328348899, 7.420451672), BoneMarrow = c(9.907426976, 
7.068326351, 8.38937883, 10.73936285, 5.448521716, 8.697436419, 
6.295933456, 6.251521738, 8.823058974)), .Names = c("gene", "Adrenal", 
"Amygdala", "Bladder", "BoneMarrow"), class = "data.frame", row.names = c(NA, 
-9L))
aichao
  • 7,375
  • 3
  • 16
  • 18
  • Thanks worked perfectly. Can you explain this part of your answer please "we compute the row-wise max using apply with the MARGIN set to 1 (i.e., the first margin or rows)" – pali Sep 03 '16 at 12:58
  • @pali: see `?apply`. Basically, `apply` applies a function along a dimension (refer in the docs as margin) of a multi-dimensional array or matrix. Here, the function is `max` and we want to apply it across `rows` (first margin or dimension) of the data frame (excluding the `gene` and `Bladder` columns). – aichao Sep 03 '16 at 13:23
0

This question was not asked terribly well, so my answer may not be quite what you are expecting, but I think what you're trying to get at is important and simple enough. My example makes use of the dplyer library which simplifies the filtering and selection of values from the data frame. Please note that I changed the value of BoneMarrow in the first row so that Bladder would be more than five times larger.

Most of the code is just to set up the example so that it is reproducible; the first and last lines are the actual answer to the question.

library(dplyr)

txt=
"gene,Adrenal,Amygdala,Bladder,BoneMarrow
1007_s_at,100.46973369,11.26483864,10.43303178,1.907426976
1053_at,6.446570421,6.462840464,6.570665594,7.068326351
117_at,8.018137441,7.738652705,7.604989675,8.38937883
121_at,10.78168853,10.3223056,10.38043102,10.73936285
1255_g_at,5.625038847,6.132930765,5.526885199,5.448521716
1294_at,8.37142904,8.1019947,8.549260758,8.697436419
1316_at,6.237386633,6.429011484,6.083330287,6.295933456
1320_at,6.206410651,6.139873183,6.328348899,6.251521738
1405_i_at,6.588370219,5.949622255,7.420451672,8.823058974"

df = read.table(textConnection(txt), header=TRUE, sep=',')

filter(df, Bladder >= BoneMarrow * 5) %>% select(Bladder)
Mikuana
  • 584
  • 5
  • 12
  • sorry for not being very clear in my question. I got this error when I run your code "Error in eval(expr, envir, enclos) : object 'Bladder' not found ". It appears that in your code it is assumed that value in Bladder will be more than 5 times than BoneMarrow but my assumption is that value in Bladder should be 5 time more than another column. Thanks though. – pali Sep 03 '16 at 12:46
  • That error was because I did something silly and used-tab delimiters in my data set, which is just begging for issues when someone copy-pastes it. I changed it to csv so hopefully it parses for you now. – Mikuana Sep 03 '16 at 14:09
  • When you say "value in Bladder should be 5 time more than another column", do you mean that Bladder should be 5 times more than *any* other column, or *every* other column? – Mikuana Sep 03 '16 at 14:10
  • I mean it should be 5 times more than every other column. Sorry for not being so clear. – pali Sep 03 '16 at 20:22