Questions tagged [rentrez]

Provides an R interface to the NCBI's EUtils API allowing users to search databases like GenBank and PubMed, process the results of those searches and pull data into their R sessions.

Provides an R interface to the NCBI's EUtils API allowing users to search databases like GenBank and PubMed, process the results of those searches and pull data into their R sessions.

38 questions
4
votes
2 answers

Sort pubmed searches from rentrez by relevance

I'm searching PubMed using the rentrez package in R and would like to get the results sorted by relevance. Currently they are sorted by publication date. library(rentrez) query = 'regression to the mean[TITL]' entrez_search =…
agbarnett
  • 160
  • 12
3
votes
3 answers

Using rentrez to parse out author and affiliation from pubmed

My overall goal is to build a co-author network graph. I have a list of PubMed ID's and these are the only publications I am interested in for the graphing of the co-author network. I can't figure out how to get both the Author names and respective…
Shirley
  • 127
  • 2
  • 8
3
votes
1 answer

xpathApply: How to pass multiple paths or nodes?

# parse PubMed data library(XML) # xpath library(rentrez) # entrez_fetch pmids <- c("25506969","25032371","24983039","24983034","24983032","24983031","26386083", "26273372","26066373","25837167","25466451","25013473","23733758") #…
user5249203
  • 4,436
  • 1
  • 19
  • 45
1
vote
0 answers

How do I download a large number of GenBank sequences using entrez_fetch in R?

I am trying to download sequence data from 1283 records in GenBank using rentrez. I'm using the following code, first to search for records fitting my criteria, then linking across databases, and finally fetching the sequence data: # Search for…
1
vote
1 answer

Rentrez is pulling the wrong data from NCBI in R?

I am trying to download sequence data from E. coli samples within the state of Washington - it's about 1283 sequences, which I know is a lot. The problem that I am running into is that entrez_search and/or entrez_fetch seem to be pulling the wrong…
1
vote
1 answer

NotXMLError: Failed to parse the XML data

I'm trying to use the Entrez module from Biopython to retrive full text articles from PubMed Central. This is my code to do the same. import urllib3 import json import requests from Bio import Entrez from Bio.Entrez import efetch,…
AnonymousMe
  • 509
  • 1
  • 5
  • 18
1
vote
1 answer

Obtaining data from NCBI gene database with R

Rentrez package I was discovering rentrez package in RStudio (Version 1.1.442) on a lab computer in Linux (Ubuntu 20.04.2) according to this manual. However, later when I wanted to run the same code on my laptop in Windows 8 Pro (RStudio 2021.09.0…
Eugene Bu
  • 51
  • 5
1
vote
0 answers

rentrez entrez_summary premature EOF

Trying to move on from my troubles with RISmed (see Problems with RISmed and large(ish) data sets), I decided to use rentrez and entrez_summary to retrieve a large list of pubmed titles from a query: set_entrez_key("######") #I did provide my real…
1
vote
0 answers

Problems extracting metadata from NCBI in R

I am trying to extract some information (metadata) from GenBank using the R package "rentrez" and the example I found here https://ajrominger.github.io/2018/05/21/gettingDNA.html. Specifically, for a particular group of organisms, I search for all…
1
vote
1 answer

Does system() open up a connection?

I have an entrez command that I'm passing through a loop in R, and it seems to work just fine for a while but I eventually get an error I'm having a hard time figuring out. Error in system(command = paste0(, : cannot…
Nick
  • 312
  • 1
  • 14
1
vote
1 answer

How do I find the nucleotide sequence of a protein using Biopython?

I have proteins for which I would like to find their corresponding nucleotide sequences. I also have the genome in which the protein is found. In the genome, I have found the corresponding Gene ID for the protein. However, I am having trouble…
Cindy Fang
  • 41
  • 5
1
vote
1 answer

How to keep Protein ID when retrieving coding sequences with rentrez

I have a bunch of protein IDs and I need to retrieve the corresponding coding sequences (CDSs). I have managed to retrieve the CDSs but the names of each sequence change from XP* to XM*, and I need to retain the XP* header for each sequence.…
Santiago
  • 67
  • 6
1
vote
1 answer

How to track which protein ID is linked to which gene ID with rentrez

I have a bunch of protein IDs and I want to fetch the corresponding coding sequences (CDSs) without loosing the protein ID. I have managed to download the corresponding CDSs, but unfortunately, CDSs IDs are very different from protein IDs in NCBI. I…
Santiago
  • 67
  • 6
1
vote
2 answers

Calculating number of xmlchildren under each parent node for a list in R

I am querying PubMED with a long list of PMIDs using R. Because entrez_fetch can only do a certain number at a time, I have broken down my ~2000 PMIDs into one list with several vectors (each about 500 in length). When I query PubMED, I am…
Shirley
  • 127
  • 2
  • 8
1
vote
1 answer

.attrs and repetitive entries in an R list

I am trying to fetch some info from NCBI using this R script: require(rentrez) require(magrittr) rs = "rs16891982" rss = c("rs16891982", "rs12203592", "rs1408799", "rs10756819", "rs35264875", "rs1393350", "rs12821256", "rs17128291", "rs1800407",…
qed
  • 22,298
  • 21
  • 125
  • 196
1
2 3