I'm having a bit of trouble figuring out why HXT is replacing my DTD's. Firstly, here is my input file to be parsed:
<!DOCTYPE html>
<html>
<head>
<title>foo</title>
</head>
<body>
<h1>foo</h1>
</body>
</html>
and this is the output that I get:
<?xml version="1.0" encoding="US-ASCII"?>
<html>
<head>
<title>foo</title>
</head>
<body>
<h1>foo</h1>
</body>
</html>
Finally, here is a simplified version of the arrows I'm using:
start (App src dest) = runX $
readDocument [ withValidate no
, withSubstDTDEntities no
, withParseHTML yes
--, withTagSoup
]
src
>>>
this
>>>
writeDocument [ withIndent yes
, withSubstDTDEntities no
, withOutputHTML
--, withOutputEncoding "UTF-8"
]
dest
I apologize for the comments - I've been toying with different combinations of configs. I just can't seem to get HXT to not mess with DTDs, even with withSubstDTDEntities no
, withValidate no
, etc. I am getting a warning saying that HXT is ignoring my doctype declaration, but that's the only bit of insight I have. Can anyone please lend me a hand? Thank you in advance!