1

I have this code (come from here):

library('biomaRt')
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- rownames(res)
G_list <- getBM(filters= "ensembl_gene_id", attributes=c("ensembl_gene_id","entrezgene", "description","hgnc_symbol"),values=genes,mart= mart)

But when I check G_list: it is empty.

I understand why:

Here some examples of my ensembl_gene_id in genes:

"ENSG00000260727.1", "ENSG00000277521.1", "ENSG00000116514.16"

If I give this ID to getBM(), it returns nothing.

However if I delete the number after the point and the point like this:

"ENSG00000260727", "ENSG00000277521", "ENSG00000116514"

I get the expected results.

Is there a way to give gene_ID with points and get the expected results?

Adam Bellaïche
  • 427
  • 3
  • 16
  • Why not remove them before querying? See [this post](https://stackoverflow.com/questions/55157124) and [this post](https://stackoverflow.com/questions/10617702) on how to remove them. – zx8754 Mar 15 '19 at 11:20
  • Yes it is what I do for the moment – Adam Bellaïche Mar 15 '19 at 11:25
  • 2
    This is a duplicate post of a question raised on Biostars: [Question: Mapping Ensembl Gene IDs with dot suffix](https://www.biostars.org/p/302441/); I've given some details in my "answer"/extended comment below. Please take a look. – Maurits Evers Mar 15 '19 at 11:58

1 Answers1

3

Not an answer but a bit too long for a comment; happy to remove if deemed not appropriate.

In short, yes, you need to remove the "dot digit" part of the Ensembl gene name. The numbers denote different version numbers associated with stable Ensembl identifiers.

From the Ensembl documentation on stable IDs:

When reassigning stable identifiers between reannotation we can optionally choose to increment the version number assigned with a stable identifier. We do so to indicate an underlying change in the entity.

For genes (i.e. Ensembl identifiers of the form ENSG*), the version number increments when the set of transcripts linked to a gene changes.

This post is in fact a duplicate of a post on Biostars: Question: Mapping Ensembl Gene IDs with dot suffix; you should take a look at some of the R solutions discussed there.


Postscript

Instead of using Biomart it's often better/faster to use some of the existing annotation packages from Bioconductor. For example, take a look at

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • OK thank you, in fact I saw this post but It was written 12 months ago. I asked myself if biomart update itself to catch this "error" (it is not really an error). I started to use biomart today. – Adam Bellaïche Mar 15 '19 at 12:43
  • 2
    No worries @AdamBellaïche; I agree with you in that I'd probably not call it in an error; I guess it boils down to how Biomart defines an `"ensembl_gene_id"`. Anyway, for mapping between different gene naming systems (Ensembl, Hugo, Entrez, RefSeq etc.) it's often better to use the `org.*.*.db` packages from Bioconductor; Biomart is slow and often not very up-to-date. – Maurits Evers Mar 15 '19 at 13:16