0

I have a matrix like this:

enter image description here

structure(list(Gene_ID = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c("g1", "g10", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9"), class = "factor"), Module_Color = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 1L), .Label = c("blue", "green", "red"), class = "factor")), .Names = c("Gene_ID", "Module_Color"), class = "data.frame", row.names = c(NA, -10L))

I want get the row indices of occurrences of all different module colors and create a list "modIndices" which will contain the row indices of all different module colors, like this:

modIndices$red={1,3,5,8} 
#as red color appears in row 1,3,5 and 8.

modIndices$blue={2,6,10}

modIndices$green={4,7,9}

Though I am able to get indices of a particular color using "which" function, I am unable to create the above list.

Please help....

J. Smith
  • 35
  • 6
  • 1
    It is better to include your data as a table or use the `dput()` function. A picture is not very helpful. Also please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a reproducible example. – phiver May 05 '18 at 09:00
  • Your indices for 'red' is at 1, 3, 5, 8 (based on the example) – akrun May 05 '18 at 09:12
  • Yes... Sorry for mistakes... I am new in this community, was not aware of how to post reproducible examples...I have corrected the mistakes.... – J. Smith May 05 '18 at 09:43

1 Answers1

2

We can just split the sequence of rows on the second column to get a list of vector indices

split(seq_len(nrow(df)), df[[2]])

Or with tidyverse, create a sequence column with row_number(), grouped by "Module Color", summarise to get a list of 'ind'

library(dplyr)
df %>% 
  mutate(rn = row_number()) %>% 
  group_by(`Module Color`) %>%
  summarise(ind = list(rn)) 

data

df <- data.frame(`Gene ID` = paste0("g", 1:10), 
    `Module Color` = c('red', 'blue', 'red', 'green', 'red', 'blue', 
  'green', 'red', 'green', 'blue'),
    stringsAsFactors = FALSE, check.names = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Now, I want to create 2 separate files "gene_red.txt", "gene_blue.txt". These files will contain only a column named 'Gene ID' with corresponding 'Gene ID' from the original data frame 'df'. E.g. "gene_red.txt" will contain just 'g1','g3','g5','g8'. I am able to create these files separately using write.table function 2 times. But how it is possible create these files using just a single write.table function? Thanks in advance... – J. Smith May 05 '18 at 15:58
  • 1
    Try `lst <- split(df[1], df[[2]]); lapply(names(lst), function(x) write.table(lst[[x]], file = paste0("gene_", x, ".txt"), row.names = FALSE, quote = FALSE))` – akrun May 05 '18 at 16:01
  • 1
    Wow!!! Thanks lot... Actually, I am not quite good in handling complex list consisting of several levels and with different way of accessing the elements by using elements names as well by using their indexes. Can you suggest a link for this, please... – J. Smith May 05 '18 at 16:28
  • 1
    @J.Smith You can check [here](https://www.r-bloggers.com/apply-lapply-rapply-sapply-functions-in-r/) or [here](https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family) for more understanding the apply family functions – akrun May 05 '18 at 16:46