Indices of Occurrences of Multiple Groups in R

Question

I have a matrix like this:

structure(list(Gene_ID = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c("g1", "g10", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9"), class = "factor"), Module_Color = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 2L, 1L), .Label = c("blue", "green", "red"), class = "factor")), .Names = c("Gene_ID", "Module_Color"), class = "data.frame", row.names = c(NA, -10L))

I want get the row indices of occurrences of all different module colors and create a list "modIndices" which will contain the row indices of all different module colors, like this:

modIndices$red={1,3,5,8} 
#as red color appears in row 1,3,5 and 8.

modIndices$blue={2,6,10}

modIndices$green={4,7,9}

Though I am able to get indices of a particular color using "which" function, I am unable to create the above list.

Please help....

It is better to include your data as a table or use the `dput()` function. A picture is not very helpful. Also please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a reproducible example. — phiver, May 05 '18 at 09:00
Your indices for 'red' is at 1, 3, 5, 8 (based on the example) — akrun, May 05 '18 at 09:12
Yes... Sorry for mistakes... I am new in this community, was not aware of how to post reproducible examples...I have corrected the mistakes.... — J. Smith, May 05 '18 at 09:43

akrun · Accepted Answer · 2018-05-05T09:11:06.743

2

We can just split the sequence of rows on the second column to get a list of vector indices

split(seq_len(nrow(df)), df[[2]])

Or with tidyverse, create a sequence column with row_number(), grouped by "Module Color", summarise to get a list of 'ind'

library(dplyr)
df %>% 
  mutate(rn = row_number()) %>% 
  group_by(`Module Color`) %>%
  summarise(ind = list(rn))

data

df <- data.frame(`Gene ID` = paste0("g", 1:10), 
    `Module Color` = c('red', 'blue', 'red', 'green', 'red', 'blue', 
  'green', 'red', 'green', 'blue'),
    stringsAsFactors = FALSE, check.names = FALSE)

edited May 05 '18 at 09:11

answered May 05 '18 at 09:03

akrun

874,273
37
540
662

Now, I want to create 2 separate files "gene_red.txt", "gene_blue.txt". These files will contain only a column named 'Gene ID' with corresponding 'Gene ID' from the original data frame 'df'. E.g. "gene_red.txt" will contain just 'g1','g3','g5','g8'. I am able to create these files separately using write.table function 2 times. But how it is possible create these files using just a single write.table function? Thanks in advance... – J. Smith May 05 '18 at 15:58
1

Try `lst <- split(df[1], df[[2]]); lapply(names(lst), function(x) write.table(lst[[x]], file = paste0("gene_", x, ".txt"), row.names = FALSE, quote = FALSE))` – akrun May 05 '18 at 16:01
1

Wow!!! Thanks lot... Actually, I am not quite good in handling complex list consisting of several levels and with different way of accessing the elements by using elements names as well by using their indexes. Can you suggest a link for this, please... – J. Smith May 05 '18 at 16:28
1

@J.Smith You can check [here](https://www.r-bloggers.com/apply-lapply-rapply-sapply-functions-in-r/) or [here](https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family) for more understanding the apply family functions – akrun May 05 '18 at 16:46

Indices of Occurrences of Multiple Groups in R

1 Answers1

data