I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.
I ended up with a working node2string
function to operate on Node
(with attributes and children), which I then inverse
d to obtain a parse
function, as exemplified in the article.
The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/>
HTML snippet into exactly one Node
representation; everything else nondeterministically yielded invalid things like
Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }
and so on.
After some initial naive attempts to fix that from within node2string
, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string
was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/>
indeed were exactly the 2 valid and constructible values of Node
that would lead to HTML representations.
I realized I had to constrain Node
to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr
). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node
constructor behind e.g. a mkNode
sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).
So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.
data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'
name2string :: Name -> String
name2string (Name s) = map name2char s
-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string
data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }
attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
| otherwise = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
where name' = name2string name
attrs' = (concatMap ((" " ++) . attr2string) attrs)
children' = intercalate "" $ map (node2string) children
inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free
parse :: String -> Node
parse = inverse node2string
This, in fact, works perfectly (in my judgement):
Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])
Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])
(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar]
print more nicely)
However, my question is:
is there a way to use something like isAlpha
(or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar
and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.
In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.