I need to automate PubMed article harvesting. I found only examples of downloading PubMed articles by term query and downloading a PubMed article by pmid. (ONE ARTICLE) But what I'm thinking about is to download a LIST of PubMed IDs by date (from-to), or all of them, like in the OAI.
Asked
Active
Viewed 1,108 times
2
-
Is your problem solved or still open? – Maximilian Peters Jul 01 '16 at 13:00
1 Answers
3
You can use BioPython for such purposes. The following code snippet will give you a link for all PubMed articles in a certain date range. PMC articles can be downloaded directly, for other articles the DOI is provided, but the location of the PDF is publisher specific and cannot be predicted for all articles.
def article_links(start_date, end_date = '3000'):
"""
start_date, end_date = 'YYYY/MM/DD'
returns a list of PubMedCentral links and a 2nd list of DOI links
"""
from Bio import Entrez
Entrez.email = "Your.Name.Here@example.org"
#get all articles in certain date range, in this case 5 articles which will be published in the future
handle = Entrez.esearch(db="pubmed", term='("%s"[Date - Publication] : "%s"[Date - Publication]) ' %(start_date, end_date))
records = Entrez.read(handle)
#get a list of Pubmed IDs for all articles
idlist = ','.join(records['IdList'])
handle = Entrez.efetch("pubmed", id=idlist, retmode="xml")
records = Entrez.parse(handle)
pmc_articles = []
doi = []
for record in records:
#get all PMC articles
if record.get('MedlineCitation'):
if record['MedlineCitation'].get('OtherID'):
for other_id in record['MedlineCitation']['OtherID']:
if other_id.title().startswith('Pmc'):
pmc_articles.append('http://www.ncbi.nlm.nih.gov/pmc/articles/%s/pdf/' % (other_id.title().upper()))
#get all DOIs
if record.get('PubmedData'):
if record['PubmedData'].get('ArticleIdList'):
for other_id in record['PubmedData']['ArticleIdList']:
if 'doi' in other_id.attributes.values():
doi.append('http://dx.doi.org/' + other_id.title())
return pmc_articles, doi
if __name__ == '__main__':
print (article_links('2016/12/20'))

Maximilian Peters
- 30,348
- 12
- 86
- 99