Java Css+ html to pdf convertion exception Invalid nested tag head found, expected closing tag link

Question

    package sandbox.xmlworker;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFile;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class D04_ParseHtmlCss {

    public static final String SRC = "/home/xxx/workspace/DemoTransformer/src/data/result.html";
    public static final String CSS = "/home/xxx/workspace/DemoTransformer/src/data/beyanname.css";
    public static final String DEST = "/home/xxx/workspace/DemoTransformer/src/data/resultpdffileson.pdf";

    public void createPdf(File file) throws IOException, DocumentException {
        // step 1
        Document document = new Document();

        // step 2
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
        writer.setInitialLeading(12.5f);

        // step 3
        document.open();

        // step 4

        // CSS
        CSSResolver cssResolver = new StyleAttrCSSResolver();
        CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS));
        cssResolver.addCss(cssFile);

        // HTML
        HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
        htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

        // Pipelines
        PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
        HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
        CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

        // XML Worker
        XMLWorker worker = new XMLWorker(css, true);
        XMLParser p = new XMLParser(worker);
        p.parse(new FileInputStream(SRC));

        // step 5
        document.close();
    }

    /**
     * Main method
     */
    public static void main(String[] args) throws IOException, DocumentException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new D04_ParseHtmlCss().createPdf(new File(DEST));
    }
}

This code is from itext sandbox: http://developers.itextpdf.com/examples/xml-worker-itext5/xml-worker-examples

Also I use itext-pdf5.4.5 and xml-worker 5.4.5

But it gives this error and I couldn't figure out how to fix this problem

Exception in thread "main" com.itextpdf.tool.xml.exceptions.RuntimeWorkerException: Invalid nested tag head found, expected closing tag link.
    at com.itextpdf.tool.xml.XMLWorker.endElement(XMLWorker.java:134)
    at com.itextpdf.tool.xml.parser.XMLParser.endElement(XMLParser.java:395)
    at com.itextpdf.tool.xml.parser.state.ClosingTagState.process(ClosingTagState.java:70)
    at com.itextpdf.tool.xml.parser.XMLParser.parseWithReader(XMLParser.java:235)
    at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:213)
    at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:174)
    at sandbox.xmlworker.D04_ParseHtmlCss.createPdf(D04_ParseHtmlCss.java:59)
    at sandbox.xmlworker.D04_ParseHtmlCss.main(D04_ParseHtmlCss.java:71)

The head of the html file is this:

<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>KDV1</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="beyanname.css">
</head>

I generated this code from xml and xslt file by using itext

The problem seems to be in the HTML, can you post it? Or at least the full `` — litelite, Jul 18 '17 at 12:31
The full head is this: 'code' KDV1 'code' I generated this code from xml and xslt files with tanother java code — Hazal Buruk, Jul 18 '17 at 12:43
Edit your question to add the code instead of putting it in the comment. As it is, it is unreadable — litelite, Jul 18 '17 at 12:55

score 0 · Answer 1 · answered Jul 18 '17 at 13:02

0

Your tool is using an XML parser to parse HTML. While the two looks quite alike, they are not exactly the same. Your error is caused by a non claused <link> tag, which is valid in HTML but not in XML. Which cause your parser to throw an exception. So for you the easiest solution would be to replace XMLParser with an HTML parser or to make sure that your HTML file is in XHTML which is XML compliant

answered Jul 18 '17 at 13:02

litelite

2,857
4
23
33

Are there any easiest way to join css, xml and xslt files in a pdf file? – Hazal Buruk Jul 18 '17 at 13:07
In your exemple you do not have any `xml` or `xslt`. But if you want to do it with pure XML, you would need to read it and the generate the PDF yourself (with the use of a PDF library of course) since XML is meant only for data. Or you could use a reporting framework that support exporting to PDF – litelite Jul 18 '17 at 13:12

Java Css+ html to pdf convertion exception Invalid nested tag head found, expected closing tag link

1 Answers1