How to subset a matrix using a large character vector

Question

I am quite new to R and I am working with a large genomic matrix and I am preparing heatmaps using certain genes. I subset a matrix containing the genes of my interest.

I tried to do it this way:

vector_infertility_genes <- infertility$V1

matrix_for_heatmap_infertility <- subset(my_genomic_matrix, vector_infertility_genes)

But this gives me just the first x number of rows from my matrix where x is the number of characters I have in vector_infertility genes.

So far I was able to dodge this problem by doing something like this:

matrix_for_heatmap_infertility <- my_genomic_matrix[c('EPHX1', 'HSPB1', 'CLU', 'GAMT',  'PICK1', 'NR3C1',
                                                                 'SIRT1', 'NPAS2', 'SPRY4', 'MAP3K1', 'SOS1', 'SALL4', 
                                                                 'GRIP1', 'PUM2', 'SOX9', 'RIPK4', 'CHD7',  'BCOR', 
                                                                 'CCNB1', 'NFE2L2', 'CHD2', 'CYP1B1', 'MDM2', 'CREBBP',
                                                                 'ICK', 'ZFY', 'SIN3A', 'GATA4'), ]

If I will have to manually type the rowname of every gene to subset like this again I will kill myself. Is there an easier way to do this with creating a character vector and using it to subset?

welcome to stackoverflow. I would recommend the folllowing (guideline)[https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example] before posting to make the most of the stackoverflow community. Yes whatever the way you need to find a way to tell which rows are you interested in. May be there is a specific classification of the genes ids that you haven't mentioned. If so you need to look into [data.table](https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html )package.to merge the the rownames with something you can easily relate to. — DJJ, Apr 10 '20 at 20:44

score 0 · Accepted Answer · answered Apr 10 '20 at 21:05

I am making a guess at the problem. Your genes are factors, and when you use them to subset a matrix, they are converted to numeric:

genes = c('EPHX1','HSPB1', 'CLU', 'GAMT','PICK1', 'NR3C1','SIRT1', 'NPAS2',
'SPRY4', 'MAP3K1', 'SOS1', 'SALL4','GRIP1', 'PUM2', 'SOX9', 'RIPK4', 'CHD7', 
'BCOR','CCNB1','NFE2L2', 'CHD2', 'CYP1B1', 'MDM2', 'CREBBP', 'ICK', 'ZFY',
'SIN3A', 'GATA4')

class(genes)
[1] "character"

infertility = data.frame(V1=genes)
vector_infertility_genes <- infertility$V1

class(vector_infertility_genes)
[1] "factor"

By default, the data.frame has characters as a vector, now below I make a matrix with some random gene names, and insert the chosen genes from 101-128:

my_genomic_matrix = matrix(runif(1000*3),ncol=3)
rownames(my_genomic_matrix) = paste0("gene",1:1000)
rownames(my_genomic_matrix)[101:128] = genes

This gives you some weird thing:

head(my_genomic_matrix[vector_infertility_genes,])
            [,1]       [,2]       [,3]
gene8  0.6705400 0.92836211 0.39245031
gene12 0.6550523 0.87094037 0.08309788
gene5  0.3737798 0.94779178 0.44279510
gene9  0.4544450 0.77939541 0.13901245
gene19 0.6284895 0.47871950 0.60837784
gene18 0.2369957 0.01336282 0.10390174

This should work in most cases, as long as you are sure your vector_infertility_genes are in the row names of my_genomic_matrix:

head(my_genomic_matrix[as.character(vector_infertility_genes),])
           [,1]       [,2]      [,3]
EPHX1 0.1380852 0.91638593 0.5155086
HSPB1 0.4828377 0.44798223 0.6011990
CLU   0.7974677 0.84083760 0.4378384
GAMT  0.9654133 0.04167125 0.6087020
PICK1 0.1958134 0.22254847 0.5157768
NR3C1 0.4228220 0.14512706 0.6136789

If some are missing you can also do:

vector_infertility_genes = as.character(vector_infertility_genes)
my_genomic_matrix[rownames(my_genomic_matrix) %in% vector_infertility_genes,]

How to subset a matrix using a large character vector

1 Answers1