I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists. How do I get an output of 90k gene symbols and na values?
Asked
Active
Viewed 3,036 times
1
-
What do you have when you set `uniqueRows = FALSE`, I mean `getBM(attributes=...,uniqueRows = FALSE)`? – agstudy Mar 14 '13 at 18:26
-
I get repeats of the same gene symbol. It doesn't help in terms of inserting na values for those probes that aren't found. – user794479 Mar 15 '13 at 04:02
-
1It is not clear for me what you try to do so? Can you please add your `getBM` reaquest to the OP, and what do you get as result. Quickly Reading the documentation , you should get a data.frame with 2 columns... – agstudy Mar 15 '13 at 04:13
1 Answers
3
To get the mappings between probeID and gene symbol you need to include the probeID in the biomaRt attributes.
Here's how I did it for some of my work using agilent microarrays:
genes<-c("A_23_P10060", "A_23_P10091", "A_23_P103951", "A_23_P10525", "A_23_P105732", "A_23_P10605", "NM_005325")
library(biomaRt)
ensembl<-useMart("ensembl", dataset="hsapiens_gene_ensembl")
ensembl.id<-grep("ENST", genes, value=T)
agilent.df<-getBM(attributes = c("hgnc_symbol","efg_agilent_wholegenome_4x44k_v1"), filters=c("efg_agilent_wholegenome_4x44k_v1"),values=genes, mart=ensembl)
genes<-merge(x = as.data.frame(genes),y = agilent.df, by.y="efg_agilent_wholegenome_4x44k_v1", all.x=T, by.x="genes")
There is a very good biomaRt tutorial that walks you though the same process. If you run this code you'll notice that one probe will have "" for a hgnc_symbol, that's because it exists in the ensemble mart but has no designated gene symbol.

emilliman5
- 5,816
- 3
- 27
- 37
-
Sorry how I can convert Rosetta generated unique probe identifier like 174996658 to gene symbol? – Angel Oct 09 '19 at 15:35
-
You'll need to provide more information. I would suggest you post a separate question to https://bioinformatics.stackexchange.com/ and create a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – emilliman5 Oct 09 '19 at 15:54