0

I have an XML file with USER_DEFINED parameters that I'm trying to parse out. Here is an example of the XML document.

         <userDefinedParameters>
           <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
           <USER_DEFINED parameter="P2">RIGHT</USER_DEFINED>
           <USER_DEFINED parameter="P3">1234</USER_DEFINED>
           <USER_DEFINED parameter="P4">5678</USER_DEFINED>
         </userDefinedParameters>
       </data>
     </segment>
   </body>
</head>

I am able to parse out all data from this file using the XML package and xpathApply. However, I can't pull out the USER_DEFINED parameter values this way.

Since there are several records in the XML, I'd like to get all P1s, P2s, etc., as I get the other fields using xpathApply. The document states all USER_DEFINED parameters are as 'parameter' and 'value' so I think I need to pull as c('parameter', 'value') but I don't know how to do this using XML.

I have looked at this SO page, it helped a lot, but doesn't answer this question.

Thanks for any/all help.

UPDATED for desired output and how I'm trying to get the data. Note, the below code doesn't work as desired.

Current xpathApply usage gets all USER_DEFINED rows within the userDefinedParameters section. If I change to xpathApply(data, "//USER_DEFINED"), xmlValue) then I get all values but no relation to the parameter name. I need something like xpathApply(data, "//USER_DEFINED/P1"), xmlValue) but, obviously, this doesn't work.

Library(XML)
fileName <- "./file.xml"
data     <- xmlParse(fileName)
xml_data <- xmlToList(data)
p1 <- xpathApply(data, "//USER_DEFINED")
p2 <- xpathApply(data, "//USER_DEFINED")

# View(p1)
#     "P1"
#     LEFT
#     LEFT
#    RIGHT

# View(p2)
#     "P2"
#    RIGHT
#    RIGHT
#     LEFT
# ...
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
dave
  • 147
  • 1
  • 1
  • 12

2 Answers2

1

Using the xml2 library, you could get the values from a node for parameter using xml_attr().

Something like this:

library(xml2)

x <- read_xml('<userDefinedParameters>
       <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
       <USER_DEFINED parameter="P2">right</USER_DEFINED>
       <USER_DEFINED parameter="P3">1234</USER_DEFINED>
       <USER_DEFINED parameter="P4">5678</USER_DEFINED>
     </userDefinedParameters>')

dataset <- data.frame(user_defined = x %>% 
                                       xml_find_all("//USER_DEFINED") %>%
                                       xml_text(),
                      parameter = x %>% 
                                    xml_find_all("//USER_DEFINED") %>%
                                    xml_attr("parameter"))

Result in dataset:

  user_defined parameter
1         LEFT        P1
2        right        P2
3         1234        P3
4         5678        P4
neilfws
  • 32,751
  • 5
  • 50
  • 63
  • I'm trying to stay with the XML package if I can - closed system - by necessity my make me move to xml2. – dave Nov 30 '21 at 00:24
1

If you like to stick with the XML package, you can use the xmlAttrs function inside sapply

text <-' <head> <body> <segment>
 <data>
 <userDefinedParameters>
           <USER_DEFINED parameter="P1">LEFT</USER_DEFINED>
           <USER_DEFINED parameter="P2">right</USER_DEFINED>
           <USER_DEFINED parameter="P3">1234</USER_DEFINED>
           <USER_DEFINED parameter="P4">5678</USER_DEFINED>
         </userDefinedParameters>
       </data>
     </segment>
   </body>
</head>'

library(XML)
#read the document
doc <- xmlRoot(xmlParse(text))

#parse out the USER Defined nodes
# in this example there are 4 nodes
nodes<-xpathApply(doc, ".//userDefinedParameters/USER_DEFINED")

#step through each of the found nodes
# xmlAttrs is not a vectorized function thus requiring a loop
attributes <- sapply(nodes, function(n) {
   #extract the attribute from each node
   # if there was more than 1 attribute this will need updating
   xmlAttrs(unlist(n)) })

#get values from each node
values<-xmlValue(nodes)

data.frame(attributes, values)
#   attributes values
# 1         P1   LEFT
# 2         P2  right
# 3         P3   1234
# 4         P4   5678
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • This gets me the parameter names, only. I updated my post to show how I'm trying to get the data; I hope that helps. – dave Nov 30 '21 at 00:26
  • This works quite well, thank you @Dave2e. I don't understand the dot in the `".//userDefinedParameters/USER_DEFINED"` or what is happening in the in-line function. Can you provide an explanation of these? – dave Dec 01 '21 at 00:12
  • The dot in front is for Xpath to search from in the current node and not globally. Probably not needed here but better to be extra safe. See comments in the code for an explanation. – Dave2e Dec 01 '21 at 00:28
  • Thank you for the comments - helps a lot. – dave Dec 01 '21 at 01:56