HXT: Left-Factoring Nondeterministic Arrows?

Question

I'm trying to come to terms with Haskell's XML Toolbox (HXT) and I'm hitting a wall somewhere, because I don't seem to fully grasp arrows as a computational tool.

Here's my problem, which I hoped to illustrate a little better using a GHCi session:

> let parse p = runLA (xread >>> p) "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> :t parse
parse :: LA XmlTree b -> [b]

So Parse is a small helper function that applies whatever arrow I give it to the trivial XML document

<root>
  <a>foo</a>
  <b>bar</b>
  <c>baz</c>
</root>

I define another helper function, this time to extract the text below a node with a given name:

> let extract s = getChildren >>> isElem >>> hasName s >>> getChildren >>> getText 
> :t extract
extract :: (ArrowXml cat) =>
   String -> cat (Data.Tree.NTree.TypeDefs.NTree XNode) String
> parse (extract "a" &&& extract "b") -- extract two nodes' content.
[("foo","bar")]

With the help of this function, it's easy to use the &&& combinator to pair up the text of two different nodes, and then, say, pass it to a constructor, like this:

> parse (extract "a" &&& extract "b" >>^ arr (\(a,b) -> (b,a))) 
[("bar","foo")]

Now comes the part I don't understand: I want to left-factor! extract calls getChildren on the root-node twice. Instead, I'd like it to only call it once! So I first get the child of the root node

> let extract' s = hasName s >>> getChildren >>> getText
> :t extract'
extract' :: (ArrowXml cat) => String -> cat XmlTree String
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[]

Note, that I've tried to re-order the calls to, say, isElem, etc. in order to find out if that's the issue. But as it stands, I just don't have any idea why this isn't working. There is an arrow 'tutorial' on the Haskell wiki and the way I understood it, it should be possible to do what I want to do that way — namely use &&& in order to pair up the results of two computations.

It does work, too — but only at the start of the arrow-chain, not mid-way trough, when I have some results already, that I want to keep 'shared.' I have the feeling that I'm just not being able to wrap my head around a difference in ideas between normal function composition and arrow notation. I'd be very appreciative of any pointers! (Even if it is just to some generic arrow-tutorial that goes a little more in-depth than the on the Haskell-wiki.)

Thank you!

I think you mean `xread` instead of `readx` in the definition of `parse`? — Travis Brown, Nov 18 '10 at 23:33
Yes, thank you. I always make that mistake, dunno why. I'm gonna edit it. — Aleksandar Dimitrov, Nov 18 '10 at 23:36
I don't have a definitive answer (and am myself somewhat interested to see whatever it turns out to be), but I suspect the `extract' "a" &&& extract' "b"` is constraining both branches - i.e., the set of acceptable parses is the set for which both branches has a result, which is the empty set. — mokus, Nov 19 '10 at 02:57
@mokus: Yes, that's what's happening here (try `arr show &&& extract' "b"`, for example). What I'd be curious to see is an explanation of *why* that's what `(&&&)` does for list arrows, when the behavior described by @Aleksandar Dimitrov seems more natural. — Travis Brown, Nov 19 '10 at 12:56
See my update below. This seems to be specifically a problem (maybe a bug?) with `runLA` and `xread` and doesn't happen when using `runX`. — Travis Brown, Nov 19 '10 at 14:12
I would actually call this a bug. I'll see if I can contact the maintainer and ask him is thinking. Thanks for clearing it up, Travis! — Aleksandar Dimitrov, Nov 19 '10 at 21:19

Travis Brown · Accepted Answer · 2010-11-19T14:10:31.390

If you convert the arrow to (and then from) a deterministic version this works as expected:

> let extract' s = unlistA >>> hasName s >>> getChildren >>> getText
> parse (listA (getChildren >>> isElem) >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This isn't really satisfactory, though, and I can't remember off the top of my head why (&&&) behaves this way with a nondeterministic arrow (I'd personally use the proc/do notation for anything much more complicated than this).

UPDATE: There seems to be something weird going on here with runLA and xread. If you use runX and readString everything works as expected:

> let xml = "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> let parse p = runX (readString [] xml >>> p)
> let extract' s = getChildren >>> hasName s >>> getChildren >>> getText
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This means you have to run the parser in the IO monad, but there are advantages to using runX anyway (better error messages, etc.).

WHAT? That is interesting. Thank you for reply! I'm a little disconcerted, but I can definitely live with it. Come to think of it, I do think this might be a bug. Ideally, the different run* functions should really just affect the Arrow something runs in (and thus its capabilities,) and not the way the arrow combinators work within the computation! — Aleksandar Dimitrov, Nov 19 '10 at 21:18

HXT: Left-Factoring Nondeterministic Arrows?

1 Answers1