How to extracting values looping over nested .nxml files in R

Asked Sep 26 '16 at 19:35

Active Sep 27 '16 at 18:02

Viewed 72 times

I am using pubmed data in .nxml format I have several categorized folders by topic (each contains 100-300 .nxml files). I wrote the following code to extract the abstarct from one single file and save it as a data frame:

library(XML)
doc <- xmlParse("Genetics_2011_Aug_188(4)_799-808.nxml")
plant.df <- as.data.frame(t(xpathSApply(doc,"//abstract",function(x) xmlSApply(x,xmlValue))))

which works for one file.

My question is when I use:

files <- (list.files(pattern = "\\.nxml$"))

to loop over files in one folder it saved the files as a character so I couldn't use xmlParse because of the type.(I got : Error: XML content does not seem to be XML:)

How can I loop over the files or in some other words automate the process? Thanks.

Updated:

    library(XML)
files <- c(list.files(pattern = "\\.nxml$",full.names=TRUE))
#print(typeof(files))
for (i in files)
  {
  allfiles <- xmlParse(i)
  abstract.df <- as.data.frame(t(xpathSApply(allfiles,"//abstract",function(x) xmlSApply(x,xmlValue))))
  }
print(abstract.df)

sink("outtext.txt") 
lapply(abstract.df, print) 
sink()

edited Sep 27 '16 at 18:02

asked Sep 26 '16 at 19:35

Sheida Soleimani

I think `xmlParse()` may be thinking the string content is actually XML content. Try using the `full.names=TRUE` parameter to `list.files()` to see if that helps `xmlParse()` disambiguate the content. – hrbrmstr Sep 26 '16 at 19:54
Thanks for the suggestion but it didn't solve the problem. – Sheida Soleimani Sep 26 '16 at 23:07
You should paste your actual code being used for the looping. – hrbrmstr Sep 27 '16 at 10:33
@hrbrmstr I found the solution, Thanks for the hint. – Sheida Soleimani Sep 27 '16 at 13:56
You should prbly add it as an answer so others can learn from it or possibly delete the question if it was a simple mistake that others are unlikely to make. – hrbrmstr Sep 27 '16 at 14:11

How to extracting values looping over nested .nxml files in R

0 Answers0