7

I'm trying to come to terms with Haskell's XML Toolbox (HXT) and I'm hitting a wall somewhere, because I don't seem to fully grasp arrows as a computational tool.

Here's my problem, which I hoped to illustrate a little better using a GHCi session:

> let parse p = runLA (xread >>> p) "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> :t parse
parse :: LA XmlTree b -> [b]

So Parse is a small helper function that applies whatever arrow I give it to the trivial XML document

<root>
  <a>foo</a>
  <b>bar</b>
  <c>baz</c>
</root>

I define another helper function, this time to extract the text below a node with a given name:

> let extract s = getChildren >>> isElem >>> hasName s >>> getChildren >>> getText 
> :t extract
extract :: (ArrowXml cat) =>
   String -> cat (Data.Tree.NTree.TypeDefs.NTree XNode) String
> parse (extract "a" &&& extract "b") -- extract two nodes' content.
[("foo","bar")]

With the help of this function, it's easy to use the &&& combinator to pair up the text of two different nodes, and then, say, pass it to a constructor, like this:

> parse (extract "a" &&& extract "b" >>^ arr (\(a,b) -> (b,a))) 
[("bar","foo")]

Now comes the part I don't understand: I want to left-factor! extract calls getChildren on the root-node twice. Instead, I'd like it to only call it once! So I first get the child of the root node

> let extract' s = hasName s >>> getChildren >>> getText
> :t extract'
extract' :: (ArrowXml cat) => String -> cat XmlTree String
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[]

Note, that I've tried to re-order the calls to, say, isElem, etc. in order to find out if that's the issue. But as it stands, I just don't have any idea why this isn't working. There is an arrow 'tutorial' on the Haskell wiki and the way I understood it, it should be possible to do what I want to do that way — namely use &&& in order to pair up the results of two computations.

It does work, too — but only at the start of the arrow-chain, not mid-way trough, when I have some results already, that I want to keep 'shared.' I have the feeling that I'm just not being able to wrap my head around a difference in ideas between normal function composition and arrow notation. I'd be very appreciative of any pointers! (Even if it is just to some generic arrow-tutorial that goes a little more in-depth than the on the Haskell-wiki.)

Thank you!

Aleksandar Dimitrov
  • 9,275
  • 3
  • 41
  • 48
  • I think you mean `xread` instead of `readx` in the definition of `parse`? – Travis Brown Nov 18 '10 at 23:33
  • Yes, thank you. I always make that mistake, dunno why. I'm gonna edit it. – Aleksandar Dimitrov Nov 18 '10 at 23:36
  • I don't have a definitive answer (and am myself somewhat interested to see whatever it turns out to be), but I suspect the `extract' "a" &&& extract' "b"` is constraining both branches - i.e., the set of acceptable parses is the set for which both branches has a result, which is the empty set. – mokus Nov 19 '10 at 02:57
  • @mokus: Yes, that's what's happening here (try `arr show &&& extract' "b"`, for example). What I'd be curious to see is an explanation of *why* that's what `(&&&)` does for list arrows, when the behavior described by @Aleksandar Dimitrov seems more natural. – Travis Brown Nov 19 '10 at 12:56
  • See my update below. This seems to be specifically a problem (maybe a bug?) with `runLA` and `xread` and doesn't happen when using `runX`. – Travis Brown Nov 19 '10 at 14:12
  • 1
    I would actually call this a bug. I'll see if I can contact the maintainer and ask him is thinking. Thanks for clearing it up, Travis! – Aleksandar Dimitrov Nov 19 '10 at 21:19

1 Answers1

2

If you convert the arrow to (and then from) a deterministic version this works as expected:

> let extract' s = unlistA >>> hasName s >>> getChildren >>> getText
> parse (listA (getChildren >>> isElem) >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This isn't really satisfactory, though, and I can't remember off the top of my head why (&&&) behaves this way with a nondeterministic arrow (I'd personally use the proc/do notation for anything much more complicated than this).


UPDATE: There seems to be something weird going on here with runLA and xread. If you use runX and readString everything works as expected:

> let xml = "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> let parse p = runX (readString [] xml >>> p)
> let extract' s = getChildren >>> hasName s >>> getChildren >>> getText
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This means you have to run the parser in the IO monad, but there are advantages to using runX anyway (better error messages, etc.).

Travis Brown
  • 138,631
  • 12
  • 375
  • 680
  • 1
    WHAT? That is interesting. Thank you for reply! I'm a little disconcerted, but I can definitely live with it. Come to think of it, I do think this might be a bug. Ideally, the different run* functions should really just affect the Arrow something runs in (and thus its capabilities,) and not the way the arrow combinators work within the computation! – Aleksandar Dimitrov Nov 19 '10 at 21:18