I am parsing an XML file with HXT
and I am trying to break up some of the node extraction into modular pieces (I have been using this as my guide). Unfortunately, I cannot figure out how to apply some of the selectors once I do the first level parsing.
import Text.XML.HXT.Core
let node tag = multi (hasName tag)
xml <- readFile "test.xml"
let doc = readString [withValidate yes, withParseHTML no, withWarnings no] xml
books <- runX $ doc >>> node "book"
I see that books has a type [XmlTree]
:t books
books :: [XmlTree]
Now I would like to get the first element of books
and then extract some values inside the sub-tree.
let b = head(books)
runX $ b >>> node "cost"
Couldn't match type ‘Data.Tree.NTree.TypeDefs.NTree’
with ‘IOSLA (XIOState ()) XmlTree’
Expected type: IOSLA (XIOState ()) XmlTree XNode
Actual type: XmlTree
In the first argument of ‘(>>>)’, namely ‘b’
In the second argument of ‘($)’, namely ‘b >>> node "cost"’
I cannot find selectors once I have an XmlTree
and I am showing the above incorrect usage to illustrate what I would like to. I know I can do this:
runX $ doc >>> node "book" >>> node "cost" /> getText
["55.9","95.0"]
But I am not only interested in cost
but also many more elements inside book
. The XML file is pretty deep so I don't want to nest everything with <+>
and much rater prefer extract the chunk I want and then extract the sub-elements in a separate function.
Example (made-up) XML File:
<?xml version="1.0" encoding="UTF-8"?><start xmlns="http://www.example.com/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<books>
<book>
<author>
<name>
<first>Joe</first>
<last>Smith</last>
</name>
<city>New York City</city>
</author>
<released>1990-11-15</released>
<isbn>1234567890</isbn>
<publisher>X Publisher</publisher>
<cost>55.9</cost>
</book>
<book>
<author>
<name>
<first>Jane</first>
<last>Jones</last>
</name>
<city>San Francisco</city>
</author>
<released>1999-01-19</released>
<isbn>0987654321</isbn>
<publisher>Y Publisher</publisher>
<cost>95.0</cost>
</book>
</books>
</start>
Can someone help me understand, how to extract the sub-elements of book
? Ideally with something as nice as >>>
and node
so I can define my own functions such as getCost
, getName
, etc. that each will roughly have the signature XmlTree -> [String]