2

I have a dataset called kidney_ensembl and I need to convert Ensembl IDs to gene names.

I'm trying the code below, but it's not working. Can somebody help me?

I know there are similar questions, but they are not helping me. Many thanks!

converting from Ensembl gene ID's to different identifier

How can I convert Ensembl ID to gene symbol in R?

library(tidyverse)
kidney <- data.frame(gene_id = c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12")
)
#kidney <- read_delim("Desktop/kidney_ensembl.txt", delim = "\t")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

library("biomaRt")

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <-  kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
              values = genes, mart= mart)

kidney_final <- left_join(kidney, gene_IDs, by = NULL)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Hi, do you mind giving a few examples of your kidney$gene_id, do head(kidney$gene_id) and paste the example in your post. It's easier to figure out what's wrong – StupidWolf Nov 15 '19 at 11:52

1 Answers1

2

The biomart part worked, it's your left join that fails because there are no common columns, gene_IDs has the ensembl id under "ensembl_gene_id" while your kidney dataframe has it under "gene_id".

Also you need to check whether they are gencode or ensembl. Gencode ids normally have a .[number] for example, ENSG00000000003.10 , in ensembl database it is ENSG00000000003.

library("biomaRt")
library("dplyr")

kidney <- data.frame(gene_id = 
c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12"),
vals=runif(5)
)
#make this a character, otherwise it will throw errors with left_join
kidney$gene_id <- as.character(kidney$gene_id)
# in case it's gencode, this mostly works
#if ensembl, will leave it alone
kidney$gene_id <- sub("[.][0-9]*","",kidney$gene_id)

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <-  kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
              values = genes, mart= mart)

left_join(kidney, gene_IDs, by = c("gene_id"="ensembl_gene_id"))

          gene_id      vals hgnc_symbol
1 ENSG00000000003 0.2298255      TSPAN6
2 ENSG00000000005 0.4662570        TNMD
3 ENSG00000000419 0.7279107        DPM1
4 ENSG00000000457 0.3240166       SCYL3
5 ENSG00000000460 0.3038986    C1orf112
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • when I run the code to generate the gene_IDs assignment, just before left_join, R connects to BioMart but I get 0 observations and 2 variables. – Fabiano Pinheiro Da Silva Nov 15 '19 at 11:30
  • can you dput(kidney) and paste the output? I don't know what you have for ensembl id – StupidWolf Nov 15 '19 at 11:34
  • head(kidney) # A tibble: 6 x 36 gene_id TPM TPM.1 TPM.2 TPM.3 TPM.4 TPM.5 TPM.6 TPM.7 TPM.8 TPM.9 TPM.10 TPM.11 1 ENSG00… 4.04 0.3 0 0 0 25.8 34.9 10.6 0 0 3.81 9.58 2 ENSG00… 0 0 0 0 0 0 0 0 0 0 0 0 3 ENSG00… 85.0 162. 44.3 48.4 7.45 181. 163. 125. 69.4 0 149. 101. – Fabiano Pinheiro Da Silva Nov 15 '19 at 11:55
  • kidney$gene_id [1] "ENSG00000000003.10" "ENSG00000000005.5" "ENSG00000000419.8" "ENSG00000000457.9" [5] "ENSG00000000460.12" "ENSG00000000938.8" "ENSG00000000971.11" "ENSG00000001036.9" [9] "ENSG00000001084.6" "ENSG00000001167.10" "ENSG00000001460.13" "ENSG00000001461.12" [13] "ENSG00000001497.12" "ENSG00000001561.6" "ENSG00000001617.7" "ENSG00000001626.10" – Fabiano Pinheiro Da Silva Nov 15 '19 at 11:57
  • yeah, you have gencode annotations. There is the .10 for example in ENSG00000000003.10 – StupidWolf Nov 15 '19 at 12:00
  • what does it mean? why there is no .10 at the end of the Ensembl IDs in your BRCAs examples? – Fabiano Pinheiro Da Silva Nov 15 '19 at 12:05
  • ok, most likely whoever provided you the ids, got it from gencode (https://www.gencodegenes.org/). they append this number. I edit my post, should solve most of your issues. – StupidWolf Nov 15 '19 at 12:07
  • If not, you have to go back to get the gencode annotation. – StupidWolf Nov 15 '19 at 12:08
  • @FabianoPinheirodaSilva, does it work now? And do the results make sense – StupidWolf Nov 15 '19 at 12:42
  • it worked and it makes sense! thank you very much! sorry for the delay, I was trying to remove the extensions of the Ensembl IDs from the kidney file to left_join... – Fabiano Pinheiro Da Silva Nov 15 '19 at 12:56
  • You are welcome! Annotations are a pain.. always ask where they come from and which version lol... – StupidWolf Nov 15 '19 at 13:02