1

I have a custom XML file that can be validated by XML lint. jQuery parses this file well too. However, I am trying to create a XML parser in R and we are getting stuck with {xml_nodeset (0)} for a xpath request.

Here's the a shorter version of the XML

<?xml version="1.0" encoding="UTF-8"?>
<Sequences>
   <Sequence ID="sp|O95256|I18RA_HUMAN" Status="Solenoid" MeanLength="26.0">
      <Description>sp|O95256|I18RA_HUMAN Interleukin-18 receptor accessory protein OS=Homo sapiens GN=IL18RAP PE=1 SV=1 [HUMAN]</Description>
      <Unit Boundaries="[53, 70]" Regularity="-9.0" Sequence="SHFCHRNRLSPKQVPEH" />
      <Unit Boundaries="[141, 163]" Regularity="-4.0" Sequence="KMILEVKPQTNASCEYSASHKQ" />
      <Unit Boundaries="[228, 261]" Regularity="7.0" Sequence="VSSWTVRAVVQVRTIVGDTKLKPDILDPVEDTL" />
      <Unit Boundaries="[308, 334]" Regularity="0.0" Sequence="KSTLKDEIIERNIILEKVTQRDLRRK" />
      <Unit Boundaries="[334, 359]" Regularity="-1.0" Sequence="FVCFVQNSIGNTTQSVQLKEKRGVV" />
      <Unit Boundaries="[359, 393]" Regularity="8.0" Sequence="LLYILLGTIGTLVAVLAASALLYRHWIEIVLLYR" />
      <Unit Boundaries="[393, 416]" Regularity="-3.0" Sequence="TYQSKDQTLGDKKDFDAFVSYAK" />
   </Sequence>
</Sequences>

When I try running :

TLR <-read_xml("theXML.xml")
Sequence <-xml_find_all(TLR, ".//Sequence")
xml_name(xml_children(TLR))

I get

> Sequence
{xml_nodeset (1)}

> Sequence[0]
{xml_nodeset (0)}

I do not know how to access the attributes of Sequence and attributes of it's children

> xml_children(Sequence[0])
{xml_nodeset (0)}

Is there something wrong with my XML? jQuery parses it well though...

  • 1
    Possible duplicate of [How to parse XML to R data frame](https://stackoverflow.com/questions/17198658/how-to-parse-xml-to-r-data-frame) – lbusett Jun 06 '17 at 17:38
  • 1
    R vectors start indices at 1 not 0, `Sequence[1]` return what you expect – GGamba Jun 06 '17 at 18:11

0 Answers0