0

I have the following document, which is a XMLNodeSet, I would like to extract the first xml value, if the leaf node has no values then it should give NA value. The result I want should be this: "11557040" "23667301" "NA"

but after using sapply(doc, xmlValue) it give the result like the follows:

[1] "115570409626101208908"  "2366730130010285360545" "\n\t"     

Any help is appreciated.

The code:
library(reutils)
doc <- esummary(c("17398008", "7847378", "17397364"), db = "pcsubstance")
doc <- doc["//CompoundIDList"]
doc


> doc
[[1]]
<CompoundIDList>
<int>11557040</int>
<int>962</int>
<int>6101</int>
<int>208908</int>

[[2]]
<CompoundIDList>
<int>23667301</int>
<int>3001028</int>
<int>5360545</int>
</CompoundIDList> 

[[3]]
<CompoundIDList>
</CompoundIDList> 

attr(,"class")
[1] "XMLNodeSet"`
  • It would be easier to help if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with data that cam be copy/pasted into R. It's difficult to reconstruct objects based on the printed output you supplied. But returning an NA value where no-match is present isn't that easy with xpath style expressions. Typically you'd have to use `getNodeSet()` to find all the node where you want to force a match and then check if they have children and extract from there. – MrFlick Jan 15 '15 at 19:43
  • @MrFlick. Thanks. I have added the reproducible example. And the answer from @Jthorpe basically give me what I want. I think, after geting the result from Jthorpe code, I replace "\n\t" with NA, it is ok. This one `sapply(doc, function(x) xmlValue(xmlChildren(x)[[1]]))` is ok. – BioChemoinformatics Jan 15 '15 at 21:05

1 Answers1

1
sapply(doc,function(x) if(length(x)))xmlValue(xmlChildren(x)[[1]]) else NA)
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • Thanks @Jthorpe. It works. although it dodes not return NA, I can replace them using NA. `> sapply(doc, function(x) if(length(x)) xmlValue(xmlChildren(x)[[1]]) else NA) [1] "11557040" "23667301" "\n\t"` – BioChemoinformatics Jan 15 '15 at 20:54
  • `sapply(doc, function(x) xmlValue(xmlChildren(x)[[1]]))` is good enough. Here if...else can not give the NA. I just replace "\n\t" with NA. Thanks. – BioChemoinformatics Jan 15 '15 at 21:08