I am trying to get the data file names from NCBI or PubMed that are related or attached to hundreds of unique DOIs or PMIDs, in R language. For example. I have PMID: 19122651 and, I want to get the names of the three GSEs connected to it, which are: GSE12781,GSE12782, and GSE12783.
I have searched various sources and packages to no avail.
Appreciate your assistance.
Asked
Active
Viewed 455 times
3

Shawn
- 149
- 1
- 3
- 9
2 Answers
5
You can do this using the rentrez package.
The required function is entrez_link.
Example:
library(rentrez)
results <- entrez_link(dbfrom = 'pubmed', id = 19122651, db = 'gds')
results$links$pubmed_gds
[1] "200012783" "200012782" "200012781"
The 3 results are the IDs for the associated GEO Dataset records. You can convert them to GSE accessions using entrez_summary
.
Here's a somewhat ugly sapply
that may serve as the basis for a function:
sapply(results$links$pubmed_gds, function (id) entrez_summary("gds", id)$accession,
USE.NAMES = FALSE)
[1] "GSE12783" "GSE12782" "GSE12781"

neilfws
- 32,751
- 5
- 50
- 63
-
This is terrific! Thank you very much. I have been racking my brain and scouring the internet and none was this simple or straight forward! Appreciate your time and assistance very much. – Shawn Mar 28 '19 at 03:01
-
2No problem. `rentrez` is a great package, well-worth getting to know. Please accept the answer if it solved the issue. – neilfws Mar 28 '19 at 03:05
-
@neilfws: you have a typo in the last line of code with `sapply`? it gives an error with `result$links$pubmed_gds` and probably was meant to be `results$links$pubmed_gds` ? – Oka Mar 28 '19 at 11:12