R order list of tables by column and selecting top 100 rows

Question

I am working with a large list results.list that contains 22 tables (23544 obs of 6 variables).

I want to sort each table by a specific column (FDR) False Discovery rate and select the first 100 rows. I can do this manually using my simple R commands.

attach(results.list$adult.OLFvsVTA)
sort(FDR)
detach(results.list$adult.OLFvsVTA)
adult.OLFvsVTA100<-adult.OLFvsVTA[1:100,]

I want to combine the top 100 rows from all 22 tables. I do not want the FDR values in the combined vector but rather I want to combine the top 100 rows by one column named (genes). I would like to automate this process using an apply function. Despite a series of attempts I can not get it to work. I created another vector called r.names that contains the names of all 22 tables in my list that I was planning to feed into my apply function. I read several apply help pages but I can't get it to work. Any help would be appreciated.

What do you mean "22 tables"? Do you have 22 variables or do you have a variable in `results.list` that indicates the "table" of each observation? — josliber, Apr 17 '14 at 18:35
What do you expect your final result to look like? A 2200 row by `ncol(results.list$adult.OLFvsVTA)` column data frame, or a 100 row by 22 * `ncol(results.list$adult.OLFvsVTA)` data frame? Related, how do you combine any two of your tables using the `genes` column? — BrodieG, Apr 17 '14 at 18:37
josilber: instead of tables I meant to say data.frame with the dimensions (23544 obs (number of rows) of 6 variables (number of columns, one of those columns is called FDR one is called genes). — Exist_HUPD, Apr 17 '14 at 19:07
BrodiG: I am hoping to get an output file containing 2200 gene names from the column genes from each file. It could be a data frame 1 column with 2200 rows. — Exist_HUPD, Apr 17 '14 at 19:09
see http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns-in-r — Andrew Cassidy, Apr 17 '14 at 19:52

Andrew Cassidy · Accepted Answer · 2014-04-17T20:35:18.487

0

do.call(rbind, lapply(results.list, function(dd) { dd[with(dd, order(FDR)),][(1:100), ]}))

so assuming results.list is a list of data frames we want to apply (lapply is for lists) the function that sorts them by FDR and grabs the first 100 rows (function(dd) {....} <- stolen from other stackoverflow post for sorting by column) to each data frame. The result of this will be a list of data frames. We can call do.call which is a fancy function that takes a function and a list where the list will be decomposed from a list to the arguments for our function. In this case our function is rbind will will take the X number of 100 row tables and create one big table. Let me know if you want further explanation.

edited Apr 17 '14 at 20:35

answered Apr 17 '14 at 19:53

Andrew Cassidy

2,940
1
22
46

Hi Andrew, I tried something similar before and it didn't work. When I try your solution I get the following error message. Error in `[.data.frame`(dd, with(dd, order(FDR)), 1:100) : undefined columns selected – Exist_HUPD Apr 17 '14 at 20:34
can you post colnames(results.list[[1]] or any other table in your list). That will help me out. You may want "FDR" – Andrew Cassidy Apr 17 '14 at 20:36
ahh nvm. Give me a second I'll update with correct form – Andrew Cassidy Apr 17 '14 at 20:37
I think it has something to do with the list structure. the results list contains 22 elements and each element contains a data frame with 6 columns and many rows. str(results.list) List of 22 $ adult.HYPvsVTA :'data.frame': 23544 obs. of 6 variables: ..$ genes : Factor w/ 23544 levels "0610005C13Rik",..: 2889 15734 7816 17776 13955 8896 2354 7653 6502 5857 ... ..$ logFC : num [1:23544] 9.66 9.21 8.05 -6.23 11.12 ... ..$ logCPM: num [1:23544] .. ..$ LR : num [1:23544] ... ..$ PValue: num [1:23544] . ..$ FDR : num [1:23544] ... – Exist_HUPD Apr 17 '14 at 20:39
colnames(results.list[[1]]) [1] "genes" "logFC" "logCPM" "LR" "PValue" "FDR" – Exist_HUPD Apr 17 '14 at 20:40
names(results.list) [1] "SN.adultvse15.5" "SN.p2vse15.5" "SNadultvsp2" "VTA.adultvse15.5" "VTA.p2vse15.5" [6] "VTA.adultvsp2" "p2.SNvsVTA" "e15.5.SNvsVTA" "e15.5.SNnegvsSN" "adult.SNnegvsSN" [11] "e15.5.SNnegvsVTA" "adult.SNnegvsVTA" "adult.OLFvsSNe15" "adult.HYPvsSNe15" "adult.OLFvsVTAe15" [16] "adult.HYPvsVTAe15" "adult.HYPvsOLF" "adult.HYPvsSN" "adult.HYPvsVTA" "adult.OLFvsSN" [21] "adult.OLFvsVTA" "adult.SNvsVTA" – Exist_HUPD Apr 17 '14 at 20:41
Thanks a lot it works flawless and I actually understand your explanation. I will be able to transfer it to my next question!! Have a great day – Exist_HUPD Apr 17 '14 at 20:56

R order list of tables by column and selecting top 100 rows

1 Answers1