4

I'm searching PubMed using the rentrez package in R and would like to get the results sorted by relevance. Currently they are sorted by publication date.

library(rentrez)

query = 'regression to the mean[TITL]'
entrez_search = entrez_search(db="pubmed", term=query, retmax=30)
paper_data = entrez_summary(db="pubmed", id=entrez_search$ids)
dates = extract_from_esummary(paper_data, c("pubdate"))
zx8754
  • 52,746
  • 12
  • 114
  • 209
agbarnett
  • 160
  • 12

2 Answers2

3

As I understand it, the "relevance" information is associated with a given search (not the record summary or complete records that might be downloaded later), and there is no score or similar saying how relevant a given search result is in the data returned by entrez search.

On the other hand, I think think the sort=relevance argument is doing something. If you send that same search twice the IDs are in the same order:

default_search = entrez_search(db="pubmed", term=query, retmax=30)
default_search_again = entrez_search(db="pubmed", term=query, retmax=30)
all(default_search$ids == default_search_again$ids)

.

[1] TRUE

Whereas setting the order to relevance changes the order:

rel_search = entrez_search(db="pubmed", term=query, retmax=30, sort="relevance")
default_search$ids == rel_search$ids

.

 [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[25] FALSE FALSE  TRUE  TRUE FALSE FALSE

Later calls to the summary, fetch and link functions should maintain this order, so this might be the easiest (only?) way to keep track of the relevance information?

david w
  • 511
  • 3
  • 12
1

extract_from_esummary is used on the paper_data esummary and is calling a selected argument. In your case it's pubdate.

When you examine the structure of paper_data e.g. by using str(paper_data) then you will notice the elements that you could as a second argument to extract_from_esummary, e.g. sorting by ISSN:

issn <- extract_from_esummary(paper_data, c("issn"))

Unfortunately for you, I can't see anything that resembles relevance.

epo3
  • 2,991
  • 2
  • 33
  • 60
  • I can add the sort='relevance' option to the entrez_search command (as explained here http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESearch_), but that does not work. – agbarnett Jul 07 '16 at 09:47
  • As I previously mentioned, I can't see a 'relevance' element in the info pulled from PubMed. I used `ent_search = entrez_search(db="pubmed", term=query, retmax=30, sort="relevance")` , then `paper_data = entrez_summary(db="pubmed", id=ent_search$ids)` and `paper_data[1]` which returned 43 items (uid, pubdate, epubdate, source, authors, lastauthor, title etc.) but nothing that resembles 'relevance'. – epo3 Jul 07 '16 at 10:19