I am using pubmed data in .nxml
format
I have several categorized folders by topic (each contains 100-300 .nxml files). I wrote the following code to extract the abstarct from one single file and save it as a data frame:
library(XML)
doc <- xmlParse("Genetics_2011_Aug_188(4)_799-808.nxml")
plant.df <- as.data.frame(t(xpathSApply(doc,"//abstract",function(x) xmlSApply(x,xmlValue))))
which works for one file.
My question is when I use:
files <- (list.files(pattern = "\\.nxml$"))
to loop over files in one folder it saved the files as a character so I couldn't use xmlParse
because of the type.(I got : Error: XML content does not seem to be XML:
)
How can I loop over the files or in some other words automate the process? Thanks.
Updated:
library(XML)
files <- c(list.files(pattern = "\\.nxml$",full.names=TRUE))
#print(typeof(files))
for (i in files)
{
allfiles <- xmlParse(i)
abstract.df <- as.data.frame(t(xpathSApply(allfiles,"//abstract",function(x) xmlSApply(x,xmlValue))))
}
print(abstract.df)
sink("outtext.txt")
lapply(abstract.df, print)
sink()