0

I'm trying to implement the java program to compare two HTML files. I gone through a lot of sources in the internet, but everything is stops at one for me. That is I'm getting below exception

Exception in thread "main" java.lang.NullPointerException
at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:279)
at org.outerj.daisy.diff.html.HtmlSaxDiffOutput.generateOutput(Unknown Source)
at org.outerj.daisy.diff.html.HTMLDiffer.diff(Unknown Source)
at com.interac.api.emt.noti.DaizyDiff.main(DaizyDiff.java:63)

My Full Code:

public class DaizyDiff {

    static String html1 = "<html class='foobar'>Hello</html>";
    static String html2 = "<html>Bye</html>";

    public static void main(String args[]) throws TransformerConfigurationException, IOException, SAXException {

        final StringWriter finalResult = new StringWriter();
        final SAXTransformerFactory tf = (SAXTransformerFactory) TransformerFactory.newInstance();

        final TransformerHandler result = tf.newTransformerHandler();
        result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        result.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        result.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        result.setResult(new StreamResult(finalResult));

        final ContentHandler postProcess = result;

        final Locale locale = Locale.getDefault();
        final String prefix = "diff";

        final NekoHtmlParser cleaner = new NekoHtmlParser();

        final InputSource oldSource = new InputSource(new StringReader(html1));
        final InputSource newSource = new InputSource(new StringReader(html2));

        final DomTreeBuilder oldHandler = new DomTreeBuilder();
        cleaner.parse(oldSource, oldHandler);
        final TextNodeComparator leftComparator = new TextNodeComparator(oldHandler, locale);

        final DomTreeBuilder newHandler = new DomTreeBuilder();
        cleaner.parse(newSource, newHandler);
        final TextNodeComparator rightComparator = new TextNodeComparator(newHandler, locale);

        final HtmlSaxDiffOutput output = new HtmlSaxDiffOutput(postProcess, prefix);

        final HTMLDiffer differ = new HTMLDiffer(output);
        differ.diff(leftComparator, rightComparator);

        System.out.println(finalResult.toString());

        System.out.println(finalResult.toString());
    }
ArunBharath
  • 143
  • 2
  • 17
  • Can you share your exact code? Exception you provided is useless without knowing what is the code behind it. – Shaq Apr 08 '19 at 20:39
  • I edited my question with my exact code. – ArunBharath Apr 08 '19 at 20:58
  • Have you looked at this example? https://www.javatips.net/api/daisydiff-master/src/main/java/org/outerj/daisy/diff/DaisyDiff.java – Shaq Apr 08 '19 at 21:21
  • This seems to be exactly the same question you asked before. I know it doesn't really have an answer, but there's no point in asking exactly the same question twice. – Michael Kay Apr 08 '19 at 22:19
  • There is an accepted answer to the question at https://stackoverflow.com/questions/54598683/getting-null-pointer-exception-net-sf-saxon-event-receivingcontenthandler-starte -- though it isn't really a solution to the problem. That answer asks you to produce a stack trace to assist with diagnosis. In future please follow up the original question rather than asking the same question again. – Michael Kay Apr 08 '19 at 22:31

1 Answers1

1

Which Saxon release are you using? In the current release (9.9) the method ReceivingContentHandler.startElement() is nowhere near line 279, which suggests you are using a rather old release.

The chances are, however, that DaisyDiff is not calling Saxon's ContentHandler in the way that it expects to be called. Unfortunately, the sequence of calls made to a ContentHandler by an XML parser depends on the way the XML parser is configured, and a typical ContentHandler implementation (like Saxon's) requires the XML parser (or other sender of ContentHandler events) to be configured in a particular way.

On reason for this is that the ContentHandler for typical Saxon use cases is a very performance-critical interface, and it would be a significant overhead for the startElement() method to do full validation of the supplied arguments on each call; it has to trust the caller.

Unless you're prepared to dive into DaisyDiff and Saxon source code to work out why there's a mismatch (and perhaps write a filter to sit between them and resolve the mismatch), you're probably best off feeding the DaisyDiff output into lexical XML, and reparsing the XML to send it to Saxon.

Looking at it further, you're actually using the TransformerHandler simply as an XML serializer. DaisyDiff (looking at the source on GitHub) is making all sorts of assumptions about the TransformerHandler/ContentHandler that it's writing to (for example, it doesn't seem to make any calls on startDocument() or endDocument()). My guess is that it has probably only been tested on the implementation of TransformerHandler that comes with the JDK, and it might well work fine with that TransformerHandler. I don't think you're doing anything here that actually needs Saxon, I think you're only picking it up because it happens to be on the classpath, and your best way forward might be to ensure that your call on TransformerFactory.newInstance() picks up the JDK transformer factory rather than Saxon. So use the version of newInstance() that expects a factory class name as its first argument, supplying "com.sun.org.apache.xalan.internal.processor.TransformerFactoryImpl".

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Hi Michael Kay. Thanks for your time. I'm new to this I don't know how to proceed further. is It possible can you help to where and what to modify my code. It would be really helpful for me Thanks. – ArunBharath Apr 09 '19 at 13:43
  • No, sorry, I can try and answer specific questions but I can't give you a private training course in Java programming. – Michael Kay Apr 09 '19 at 14:06
  • I had the same issue. Thanks to Michael's answer I have been able to solve the issue by "simply" instantiating the ```org.apache.xalan.processor.TransformerFactoryImpl``` directly. e.g. ```final SAXTransformerFactory tf = new TransformerFactoryImpl();``` No need (at least in the example of the question to use the ```TransformerFactory```. – wprogLK Oct 12 '22 at 12:27