2

I am trying to convert a list of gene names to entrez gene IDs.

for now i have this:

>library(biomaRt)    
>ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>mapping <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
                          'entrezgene', 'hgnc_symbol'),mart = ensembl)

This creates a table with the entrez gene IDs and names. However how can I filter out the IDs based on my gene list?

This is an example of the gene names list: Gene names

It is just an excel files with couple of hundred gene names in total.

Hopefully someone could help me!

1 Answers1

3

Data

Create a vector of gene names:

mygenes <- c("TNF", "IL6", "IL1B", "IL10", "CRP", "TGFB1", "CXCL8")

Retrieve information from the BioMart:

library(biomaRt)

hsmart <- useMart(dataset = "hsapiens_gene_ensembl", biomart = "ensembl")

hsmart

# Object of class 'Mart':
#   Using the ENSEMBL_MART_ENSEMBL BioMart database
#   Using the hsapiens_gene_ensembl dataset

Map gene names to Ensembl gene ids, transcript ids, entreze ids

To do this, you don't need to convert whole database into the table of corresponding ids. Using filter = "hgns_symbol" as parameter for your getBM() call, will subset database by gene names you've provided as a values argument of getBM() function:

mapping <- getBM(
  attributes = c('ensembl_gene_id', 'ensembl_transcript_id', 'entrezgene', 'hgnc_symbol'), 
  filters = 'hgnc_symbol',
  values = mygenes,
  mart = hsmart
)

Which give you 43 records for your genes:

mapping %>%
  arrange(hgnc_symbol, ensembl_gene_id, ensembl_transcript_id, entrezgene)

#   ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol
#1  ENSG00000132693       ENST00000255030       1401         CRP
#2  ENSG00000132693       ENST00000368110       1401         CRP
#3  ENSG00000132693       ENST00000368111       1401         CRP
#4  ENSG00000132693       ENST00000368112       1401         CRP
#5  ENSG00000132693       ENST00000437342       1401         CRP
#
#   ............................................................
#
#39 ENSG00000228321       ENST00000412275       7124         TNF
#40 ENSG00000228849       ENST00000420425       7124         TNF
#41 ENSG00000228978       ENST00000445232       7124         TNF
#42 ENSG00000230108       ENST00000443707       7124         TNF
#43 ENSG00000232810       ENST00000449264       7124         TNF
utubun
  • 4,400
  • 1
  • 14
  • 17
  • Thank you for your answer! However, is there a way to convert the list of gene names to a vector instead of manually typing the list? – Laurent Winckers Nov 19 '18 at 08:43
  • I am not sure I understood it right, @Laurent. Do you mean, how to convert gene names from your excel file into the vector in R? – utubun Nov 19 '18 at 10:32
  • 1
    Nevermind. I set the gene names in an excel file and saved it as a .txt I used this line to read the .txt file: mygenes <- read.table("gene names.txt", header = T) Thank you for your help! – Laurent Winckers Nov 19 '18 at 12:23
  • That what I needed to know. You can read your excel file into R using `readxl` (https://readxl.tidyverse.org/) or using `xlsx` (https://cran.r-project.org/web/packages/xlsx/index.html) packages. The last one allows you to read (and write) separate sheets of your excel book. After that by usual subsetting (and, I am afraid unlisting in case of `readxl`) of the data, you can convert it into the vector, or directly use content of particular column with your gene names as a query for `getBM()`. – utubun Nov 19 '18 at 12:30
  • 1
    entrezgene has changed to entrezgene_id – asalimih Aug 28 '21 at 16:25