0

I am using java and iText 5 to produce a PDF. One of my input lines is from a WYSIWYG editor containing html with a base64 image imbedded (i.e., not the link to the image). The WYSIWYG can have zero to many images.

WYSIWYG contains:

enter image description here

This "Description" is processed by my code:

Document document = new Document(PageSize.A4, 72f, 72f, 72f, 72f);
PdfWriter.getInstance(document, resourceImage);
document.open();

          String ppDescription = "";
          if(activityDtl.getPPDescription() == null || activityDtl.getPPDescription().isEmpty()){
              ppDescription = "";
          }else{
              //Clean the HTML to be correct XHTML
              String cleanDesc = cleanHTML(activityDtl.getPPDescription());
              InputStream inputStream1 = new ByteArrayInputStream (cleanDesc.getBytes("UTF-8"));
              ByteArrayOutputStream baos1 = new ByteArrayOutputStream();
              Tidy tidy1 = new Tidy();
              tidy1.setXHTML(true);
              tidy1.setQuiet(true);
              tidy1.setShowWarnings(false);
                
              tidy1.parseDOM(inputStream1, baos1);
              ppDescription = baos1.toString();
//            System.out.println("ppDescription: " + ppDescription);
          }

          p6.add(new Chunk("Description:   ", smallBold));
          if(ppDescription == null || ppDescription.isEmpty()){
              p6.add("");
          }else{
              ElementList list1 = XMLWorkerHelper.parseToElementList(ppDescription, null);
              System.out.println("list1: " + list1);
              for (Element element : list1) {
                  p6.add(element);
              }
          }
          cell.addElement(p6);

This is what is received in the input for this field (Description) is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net" />
<title></title>
</head>
<body>
<p>Cooking instructions:</p>
<p><img
src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAggAAAC .... H3BNquwQYUAAAAASUVORK5CYII="
 alt="" /></p>
<p>Cook the fish.</p>
</body>
</html>

And this is what is in the PDF:

enter image description here

What I would like is to have in the PDF the same as the first image in the WYSIWYG (i.e., the image between the two instruction lines).

Glyn
  • 1,933
  • 5
  • 37
  • 60
  • The iText layout elements accept plain text and have setters for assorted style properties. If you want to convert HTML to such elements, use the iText `XMLWorker` to do so. – mkl Nov 15 '20 at 07:12
  • Thank you mkl. I have modified my code above and am having trouble adding the output of the conversion to my code. Also, will this cater for images in the WYSIWYG? – Glyn Nov 15 '20 at 21:46
  • I found how to display the html (please see above); however, it does not work if there is an image in the WYSIWYG. If there is an image in the WYSIWYG then the whole paragraph is not displayed (i.e., ANZAC Biscuits has an image and is not displayed, Tests - Copy does not have an image and is displayed. If I remove the image from ANZAC Biscuits then it displays.). – Glyn Nov 16 '20 at 06:35
  • How is the image referenced from that HTML? The `XMLWorker` most likely does not support the full HTML standard; but it is extendable, probably you merely have to add some helper class. (I have to admit, though, that I don't really know the `XMLWorker` in depth as I don't have to deal with HTML-to-PDF use cases at all.) – mkl Nov 16 '20 at 08:30
  • Hi mkl, I am using Summernote WYSIWYG edit on bootstrap. The image is included in the text field and stored in the database (MySQL as mediumtext). There is no reference to the image. – Glyn Nov 16 '20 at 23:32
  • Please look at the HTML you have in case of an entry with an image. There must be some reference to an image in it, otherwise you wouldn't expect iText to display it based on the HTML alone, would you? – mkl Nov 17 '20 at 05:54
  • Hi mkl, within the html I have – Glyn Nov 20 '20 at 22:25
  • *"within the html I have "* - ah, base64 data URLs. Then you might have to add a custom image provider or image tag processor implementation for data URLs, see [here](https://stackoverflow.com/a/20938015/1729265) or [here](https://stackoverflow.com/a/19398426/1729265). *"any tag without an ending (e.g., br, img) causes XMLWorkerHelper to throw an error"* - the XML worker is called `XMLWorker` for a reason: It works with XML. For HTML processing it requires XHTML. Thus, you should pre-process your HTML to form it into XHTML in which all tags have ending tags – mkl Nov 22 '20 at 14:36
  • Hi mkl, I have read your links; however, I can not figure out how to implement them into my code. Are you able to help please. – Glyn Nov 29 '20 at 04:31
  • I have updated my question to show all the changes I have made to date. – Glyn Nov 29 '20 at 05:10
  • @mkl Almost there. All I need now is to send my output to a table using final InputStream is = new ByteArrayInputStream (ppDescription.getBytes("UTF-8")); xmlParser.parse(is, charset); and Paragraph p6 = new Paragraph(""); p6.add(element); cell.addElement(p6); How can I implement this please? – Glyn Nov 30 '20 at 04:41
  • What do you mean by *"send my output to a table"*? – mkl Nov 30 '20 at 08:39
  • Hi mkl, please see below. – Glyn Nov 30 '20 at 20:47

1 Answers1

0

Thanks to mkl and this link https://stackoverflow.com/a/20938015/1729265 I have:

Document document = new Document(PageSize.A4, 72f, 72f, 72f, 72f);
PdfWriter.getInstance(document, resourceImage);
document.open();


for (final PPActityDetail activityDtl : activityList) {
    PdfPCell cell = new PdfPCell();
    cell = new PdfPCell();


    Paragraph p6 = new Paragraph("");

    p6.add(new Chunk("Description:   ", smallBold));

    final TagProcessorFactory tagProcessorFactory = 
    Tags.getHtmlTagProcessorFactory();
    tagProcessorFactory.removeProcessor(HTML.Tag.IMG);
    tagProcessorFactory.addProcessor(new ImageTagProcessor(), HTML.Tag.IMG);

    final CssFilesImpl cssFiles = new CssFilesImpl();
    cssFiles.add(XMLWorkerHelper.getInstance().getDefaultCSS());
    final StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver(cssFiles);
    final HtmlPipelineContext hpc = new HtmlPipelineContext(new CssAppliersImpl());
              
    hpc.setAcceptUnknown(true).autoBookmark(true).setTagFactory(tagProcessorFactory);
    final HtmlPipeline htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, pdfWriter));
    final Pipeline<?> pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
    final XMLWorker worker = new XMLWorker(pipeline, true);
    final Charset charset = Charset.forName("UTF-8");
    final XMLParser xmlParser = new XMLParser(true, worker, charset);
    final InputStream is = new ByteArrayInputStream (ppDescription.getBytes("UTF-8"));
    xmlParser.parse(is, charset);

    cell.addElement(p6);

Now I need to add the output of above (xmlParser.parse(is, charset)) to p6 to include it in the table. I tried:

p6.add(xmlParser.parse(is, charset));

However, that gives me an error message:

The method add(Element) in the type Paragraph is not applicable for the arguments (void)

This is the modified class:

class ImageTagProcessor extends com.itextpdf.tool.xml.html.Image {

    private final Logger logger = LoggerFactory.getLogger(getClass());
    
    /*
     * (non-Javadoc)
     * 
     * @see com.itextpdf.tool.xml.TagProcessor#endElement(com.itextpdf.tool.xml.Tag, java.util.List, com.itextpdf.text.Document)
     */
    public List<Element> end(final WorkerContext ctx, final Tag tag, final List<Element> currentContent) {
        final Map<String, String> attributes = tag.getAttributes();
        String src = attributes.get(HTML.Attribute.SRC);
        List<Element> elements = new ArrayList<Element>(1);
        if (null != src && src.length() > 0) {
            Image img = null;
            if (src.startsWith("data:image/")) {
                final String base64Data = src.substring(src.indexOf(",") + 1);
                try {
                    img = Image.getInstance(Base64.decode(base64Data));
                } catch (Exception e) {
                    if (logger.isLogging(Level.ERROR)) {
                        logger.error(String.format(LocaleMessages.getInstance().getMessage(LocaleMessages.HTML_IMG_RETRIEVE_FAIL), src), e);
                    }
                }
                if (img != null) {
                    try {
                        final HtmlPipelineContext htmlPipelineContext = getHtmlPipelineContext(ctx);
                        elements.add(getCssAppliers().apply(new Chunk((com.itextpdf.text.Image) getCssAppliers().apply(img, tag, htmlPipelineContext), 0, 0, true), tag,
                            htmlPipelineContext));
                    } catch (NoCustomContextException e) {
                        throw new RuntimeWorkerException(e);
                    }
                }
            }
    
            if (img == null) {
                elements = super.end(ctx, tag, currentContent);
            }
        }
        return elements;
    }
}
Glyn
  • 1,933
  • 5
  • 37
  • 60