Generating a parser with `inverse`, with constraints on the grammar

Question

I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.

I ended up with a working node2string function to operate on Node (with attributes and children), which I then inversed to obtain a parse function, as exemplified in the article.

The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/> HTML snippet into exactly one Node representation; everything else nondeterministically yielded invalid things like

Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }

and so on.

After some initial naive attempts to fix that from within node2string, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/> indeed were exactly the 2 valid and constructible values of Node that would lead to HTML representations.

I realized I had to constrain Node to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node constructor behind e.g. a mkNode sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).

So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.

data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'

name2string :: Name -> String
name2string (Name s) = map name2char s

-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string

data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }

attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
  where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])

node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
                                       | otherwise     = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
  where name'     = name2string name
        attrs'    = (concatMap ((" " ++) . attr2string) attrs)
        children' = intercalate "" $ map (node2string) children

inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free

parse :: String -> Node
parse = inverse node2string

This, in fact, works perfectly (in my judgement):

Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])

Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])

(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar] print more nicely)

However, my question is:

is there a way to use something like isAlpha (or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.

In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.

ichistmeinname · Accepted Answer · 2015-11-10T08:27:45.023

You can achieve the same results using just Char. As you've already pointed out, you can use isAlpha to define name2char as a partial identity. I changed the following lines of your code.

type NameChar = Char

name2char :: NameChar -> Char
name2char c | isAlpha c = c

The two exemplary expressions then evaluate as follows.

test> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])

test> parse "<input type=\"submit\"/>"
(Node (Name "input") [(Attr (Name "type") "submit")] [])

As a side-effect, names with non-alpha characters silently fail with nameFromString.

test> nameFromString "input "

Edit: Since you seem to be a fan of function patterns, you can define generators for Nodes and Attrs and use them in your conversion function.

attr :: Name -> String -> Attr
attr name val
  | name `elem` ["type", "src", "alt", "name"] = Attr name val

node :: String -> [Attr] -> [Node] -> Node
node name [] nodes
  |  name `elem` ["a", "p"] = Node name [] nodes
node name attrPairs@(_:_) nodes
  |  name `elem` ["img", "input"] = Node name attrPairs nodes

node2string :: Node -> String
node2string (node name attrs children)
  | null children = "<" ++ name ++ attrs' ++ "/>"
  | otherwise     = "<" ++ name ++ attrs' ++ ">"
                  ++ children' ++ "</" ++ name' ++ ">"
 where
  name'     = name
  attrs'    = concatMap ((" " ++) . attr2string) attrs
  children' = intercalate "" $ map (node2string) children

attr2string :: Attr -> String
attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
 where
  escape = concatMap (\c -> if c == '"' then "\\\"" else [c])

This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name).

Edit2: Besides the fact that the solution with the isAlpha condition is quite "prettier" than your verbose solution, it is also defined in a declarative way. Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar data type. The isAlpha condition on the other hand is a good example for a declarative specification of the wanted property. Does this answer your question? I'm not sure what you are aiming at.

this is awesome. I cannot understand how I didn't come to this realization myself, but that's possibly just because I gave up to early, thinking I'd fail :) — Erik Kaplun, Nov 09 '15 at 10:27
so basically you're still not restricting the _set of legal constructions_ of `Node`, but you are restricting the _"mapping pipe"_ between `Node` and `String, essentially. — Erik Kaplun, Nov 09 '15 at 10:34
getting a bit chatty here, but just one more request for insight: are there any advantages to how its currently designed in the code? aside from more by-design correctness from a type theoretical point of view? — Erik Kaplun, Nov 09 '15 at 10:56
I added a code example that limits the set of _legal constructions_. — ichistmeinname, Nov 09 '15 at 13:52
OK, after Edit2, I think I see how this works — I think I was taking it too fundamentally and assuming Curry was, too. My understanding essentially boils down to "Make Illegal States Irrepresentable" but I can see now this isn't so important for getting the logic programming slice of the problem working, and pertains more to just correctness and program structure. Thank you very much for your time and efforts! — Erik Kaplun, Nov 10 '15 at 10:27
You're welcome ; ) In order to reach a greater audience of Curry programmers, you can always ask your question on the Curry mailing list (https://www.informatik.uni-kiel.de/~curry/listarchive/) or maybe stick to SO and cross-link the question in a post on the mailing list (because I don't mind the SO-points ; )) Your last question got the attention from my colleagues, because I mailed them your question; so in general, the Curry community is (better) available on the mailing list. — ichistmeinname, Nov 10 '15 at 13:24
OK! I will do exactly as you said: post here and cross link there, to bring them here, make it publicly useful and get you some rep points :) I'm extremely happy this is happening and I hope Curry will start getting more of the exposure it deserves. — Erik Kaplun, Nov 10 '15 at 14:28

Generating a parser with `inverse`, with constraints on the grammar

1 Answers1