0

I am using DocumentBuilder to convert xhtml(xml) from the internet which contains "--" in comment to org.w3c.dom.Document. Are there may method to bypass it? I have already set the setIgnoringComments and setValidating.

I know -- is not permitted to appear within comments in XML in W3C specification. related posts.

Any suggestions to preprocess XML before convention?

public static Document convertXmlStrToDocument(String xml) throws ParserConfigurationException, SAXException, IOException{
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    documentBuilderFactory.setIgnoringComments(true);
    documentBuilderFactory.setValidating(false);
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
    return document;
}

It throw exception:

org.xml.sax.SAXParseException; lineNumber: 914; columnNumber: 17; The string "--" is not permitted within comments.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
    at com.techoffice.util.XmlUtil.convertXmlStrToDocument(XmlUtil.java:41)
    at com.techoffice.util.XmlUtil.evaluateXpath(XmlUtil.java:46)
    at com.techoffice.jc.horse.service.web.ResultWebService.raceDateSelect(ResultWebService.java:41)
    at com.techoffice.jc.horse.service.web.ResultWebServiceTest.retrieveXml(ResultWebServiceTest.java:35)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:252)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:94)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:191)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Community
  • 1
  • 1
Ben Cheng
  • 769
  • 10
  • 25
  • Thank. I got the definite answer, no. I would like to know any method to pass it including preprocess the XML content. – Ben Cheng Dec 13 '16 at 00:36
  • I found that html tidy is console application and a c library. But my application is a Java. – Ben Cheng Dec 13 '16 at 00:55
  • Then look at the Java version of HTML Tidy (answer updated), but note that this question seems to be morphing into a tool/library request, which is offtopic here. – kjhughes Dec 13 '16 at 01:22

2 Answers2

1

No, the string "--" must not appear within an XML comment:

For compatibility, the string " -- " (double-hyphen) must not occur within comments.

This is not configurable. Anything's hackable, but you'll be going against the grain and without XML parser support. Not recommended.

Try HTML Tidy to clean-up the HTML first. There is also a Java version of HTML Tidy.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
0

If this is the situation

function escape(input) {
    input = input.replace(/->/g, '_');

    return '<!-- ' + input + ' -->';
}    

if you want to bypass the Html comment by input then use

--!>

after this, you can write whatever you want.

Michcio
  • 2,708
  • 19
  • 26