3

My overall goal is to build a co-author network graph. I have a list of PubMed ID's and these are the only publications I am interested in for the graphing of the co-author network. I can't figure out how to get both the Author names and respective affiliations together in my query using rentrez. I can get both information but my list of affiliations is about 300 less than my author list so obviously some did not provide affiliations but I can't figure out who. Any way to search for Author and affiliation combined? [When I did both in my entrez_fetch, it just gave me a list of authors and affiliations separately so I still can't figure out which affiliations belong with which authors.]

library(tidyverse)
library(rentrez)
library(XML)

trial<-entrez_fetch(db="pubmed", id=pub.list$PMID, rettype="xml", parsed=TRUE)
affiliations<-xpathSApply(trial, "//Affiliation", xmlValue)
first.names<-xpathSApply(trial, "//Author/ForeName", xmlValue)

This all works fine but I can't figure out which authors are with which affiliations since their lengths are different.

Any help would be greatly appreciated. Thanks!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Shirley
  • 127
  • 2
  • 8

3 Answers3

1

You could try something like:

xpathSApply(trial, "//Author", function(x) {
  author_name <- xmlValue(x[["LastName"]])
  author_affiliation <- xmlValue(x[["AffiliationInfo"]][["Affiliation"]])
  c(author_name,author_affiliation)
  })

It returns in the first row the last name of the authors and in the second row their affiliation by getting these values for each //Author node.

NicE
  • 21,165
  • 3
  • 51
  • 68
  • Thank you! When I ran the code as you had written it, I ended up getting a weird format where only numbers were showing up. I split the code into 2 parts (one for authors and one for affiliations, using the exact same format you have) and then combined them and this time it shows where the NAs for affiliations show up! – Shirley Feb 22 '17 at 18:49
  • Great, the output is a matrix. If you want a more readable format, you can store the matrix in a variable, data for ex, and then do `as.data.frame(t(data))` to get it as a dataframe with one author per line, and in the first column the last name and the affiliation in the second. – NicE Feb 22 '17 at 18:57
0
last.name<-xpathSApply(trial, "//Author", function(x) {
  author_name <- xmlValue(x[["LastName"]])})

affiliation<-xpathSApply(trial, "//Author", function(x) {
  author_affiliation <- xmlValue(x[["AffiliationInfo"]][["Affiliation"]])})

This is what I ended up using, following NicE's format and it worked--I can see where the NA's for affiliations are now.

Shirley
  • 127
  • 2
  • 8
0

I took @NicE 's code and @Shirley 's comments and wrote this code:

lastname_affiliation <-data.frame(cbind(
    xpathSApply(trial, "//Author", function(x) {
        author_name <- xmlValue(x[["LastName"]])
    }), 
    xpathSApply(trial, "//Author", function(x) {
        author_affiliation <- xmlValue(x[["AffiliationInfo"]][["Affiliation"]])
    })
))

Thanks for putting me on the right path.

Josh
  • 1,210
  • 12
  • 30