1

I want to make PDFs exported by Jasper PDF/UA compliant, but limitations of Jasper are preventing me from doing so. Client is pressuring us to get this done properly.

PDF/UA has a lot of requirements, including but not limited to displaying title and language, embedding fonts, and adding alternate text to images. So far, I have set all the 508 PDF tags, set properties to display title and language, embedded fonts, and added alt text to images all in Jaspersoft Studio. I have also appended the PDF/UA identifier to the output PDF (i.e. after the PDF was generated) via Apache PDFBox. We are using Jaspersoft Studio v6.6.0 coupled with Jasper Reports Library v6.4.0 and Oracle for the DB. From what I've read, Jasper has limited capabilities in this regard due to itext being downgraded back to v2.1.7.js6 because of licensing issues.

<jasperReport xlmns=...>
        ... // other properties
        <property name="net.sf.jasperreports.awt.ignore.missing.font" value="false"/>
        <property name="net.sf.jasperreports.export.xls.detect.cell.type" value="false"/>
        <property name="net.sf.jasperreports.export.xls.sheet.names.all" value="REPORT SHEET NAME"/>
        <property name="net.sjasperreports.default.pdf.font.name" value="Times-Roman"/>
        <property name="net.sf.jasperreports.export.xls.ignore.graphics" value="false"/>
        <property name="net.sf.jasperreports.default.pdf.embedded" value="true"/>
        <property name="net.sf.jasperreports.export.pdf.metadata.title" value="MY REPORT TITLE"/>
        <property name="net.sf.jasperreports.export.pdf.display.metadata.title" value="true"/>
        <property name="net.sf.jasperreports.export.pdf.tagged" value="true"/>
        <property name="net.sf.jasperreports.export.pdf.tag.language" value="EN-US"/>
        ... // parameters, stored proc call, headings, etc.
        <!-- Possible PDF 508 tags to be set on text fields -->
        <property name="net.sf.jasperreports.export.pdf.tag.table" value="start"/>
        <property name="net.sf.jasperreports.export.pdf.tag.th" value="full"/>
        <property name="net.sf.jasperreports.export.pdf.tag.tr" value="start">
        <property name="net.sf.jasperreports.export.pdf.tag.td" value="full">
        <property name="net.sf.jasperreports.export.pdf.tag.tr" value="end">
        <property name="net.sf.jasperreports.export.pdf.tag.table" value="start"/>
        ...
</jasperReport>
... // other imports
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.XMPSchema;
import org.apache.xmpbox.xml.XmpSerializer;
... // more imports

public class ReportResult {
   ... // other methods

    /*
     * @param pdf - The pdf instance created from BAOS
     * @param title - Document
     * @return BAOS containing metadata (UA-identifier, title)
     */
    private ByteArrayOutputStream appendXMPMetaData(PDDocument pdf, String title) throws TransformerException, IOException {
        XMPMetadata xmp = XMPMetadata.createXMPMetadata();
        xmp.createAndAddDublinCoreSchema();
        xmp.getDublinCoreSchema().setTitle(title);
        xmp.getDublinCoreSchema().setDescription(title);
        xmp.createAndAddPDFAExtensionSchemaWithDefaultNS();
        xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/schema#", "pdfaSchema");
        xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/property#", "pdfaProperty");
        xmp.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfua/ns/id/", "pdfuaid");

        XMPSchema uaSchema = new XMPSchema(XMPMetadata.createXMPMetadata(),
                "pdfaSchema", "pdfaSchema", "pdfaSchema");
        uaSchema.setTextPropertyValue("schema", "PDF/UA Universal Accessibility Schema");
        uaSchema.setTextPropertyValue("namespaceURI", "http://www.aiim.org/pdfua/ns/id/");
        uaSchema.setTextPropertyValue("prefix", "pdfuaid");

        XMPSchema uaProp = new XMPSchema(XMPMetadata.createXMPMetadata(),"pdfaProperty", "pdfaProperty", "pdfaProperty");
        uaProp.setTextPropertyValue("name", "part");
        uaProp.setTextPropertyValue("valueType", "Integer");
        uaProp.setTextPropertyValue("category", "internal");
        uaProp.setTextPropertyValue("description", "Indicates, which part of ISO 14289 standard is followed");
        uaSchema.addUnqualifiedSequenceValue("property", uaProp);

        xmp.getPDFExtensionSchema().addBagValue("schemas", uaSchema);
        xmp.getPDFExtensionSchema().setPrefix("pdfuaid");
        xmp.getPDFExtensionSchema().setTextPropertyValue("part", "1");

        XmpSerializer serializer = new XmpSerializer();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        serializer.serialize(xmp, baos, true);

        PDMetadata metadata = new PDMetadata(pdf);
        metadata.importXMPMetadata(baos.toByteArray());
        pdf.getDocumentCatalog().setMetadata(metadata);

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        pdf.save(byteArrayOutputStream);
        pdf.close();

        return byteArrayOutputStream;
    } 

    protected void getJasperPDFDoc(ReportConfig reportConfig) throws IOException, TransformerException {

        List<ReportParameter> reportParams = reportConfig.getReportParams();

        ... // cookies and printer config

        Map imagesMap = new HashMap();
        request.getSession(true).setAttribute("IMAGES_MAP", imagesMap);

        ByteArrayOutputStream bs = ReportAccess.Instance.getInstance().generateJasperReport(
                getCurrentUserId(), getCurrentUserName(), reportConfig, "PDF",
                 reportParams, getTmpImageUri(),
                 imagesMap, rptTemplateLoc);

        if (bs != null) {
            if (reportConfig.doPrint) {
                response.setContentType("text/html");
            } else {
                log.debug("Got PDF report data");
                String fileName = getReportFileName(reportConfig) + ".pdf";
                response.setContentType("application/pdf");
                String dispositionProperty = "attachment; filename=" + fileName;
                response.setHeader("Content-disposition", dispositionProperty);
            }

            PDDocument pdf = PDDocument.load(new ByteArrayInputStream(bs.toByteArray()));
            ByteArrayOutputStream baosWithMetaData = appendXMPMetaData(pdf, reportConfig.getDisplayName());

            response.setHeader("Content-length", Integer.toString(baosWithMetaData.size()));
            ServletOutputStream os = response.getOutputStream();
            baosWithMetaData.writeTo(os);

            os.flush();
            os.close();
        } else {
            displayError("PDF");
        }
     }

     ... // other methods
}
/* REPORT MANAGER CLASS */
private static void generatePDFDoc(JasperPrint jasperPrint, ByteArrayOutputStream f) {

        try {

            JasperPrint jr = moveTableOfContents(jasperPrint);
            JRPdfExporter exporter  = new JRPdfExporter();
            exporter.setExporterInput(new SimpleExporterInput(jr));
            exporter.setExporterOutput(new SimpleOutputStreamExporterOutput(f));

            //configuration
            SimplePdfExporterConfiguration configuration = new SimplePdfExporterConfiguration();
            configuration.setCompressed(true);
            configuration.setTagged(true);
            configuration.setTagLanguage("EN-US");

            //set configuration
            exporter.setConfiguration(configuration);

            //export to PDF
            exporter.exportReport();
        } catch (Exception e) {
            log.error(e.getMessage(), e);
        }
    }

I noticed a handful of errors reported by Adobe's Preflight checker as well as our client, listed below:

  1. Non-standard tag present
  2. Circular Role Map
  3. Unknown Anchor cell appended to top-left of every page
  4. Table is not properly recognized in table-editor view

Images showing my problem(s). Any help in this regard is kindly appreciated.

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
  • I have removed the PDFBox tag. While you did use PDFBox to fill the xmp stuff, your problems are not related to that. All four errors you mention indicate problems with the structure tree usage. Whatever goes wrong is either in Jasper or in itext. – Tilman Hausherr Jun 13 '19 at 10:50
  • The iText version used in JR, 2.1.7, is from a time when there was no big interest in correctly tagged PDFs. So even if it contains some tagging support, I doubt it's properly tested and hardened. – mkl Jun 13 '19 at 12:20
  • Is there any way that I can add cell and associated header IDs? – vicious_koala Jun 13 '19 at 14:15
  • See here and the links in the comments: https://stackoverflow.com/questions/56231681/ But it is a real pain to do. I think that the answers only scratch the surface. – Tilman Hausherr Jun 13 '19 at 18:48
  • So we'd have to rewrite our reporting engine completely in order to achieve this? – vicious_koala Jun 20 '19 at 01:16

1 Answers1

0

If you want to make things simpler, but different way, PD4ML v4 can be an option. There is a minimalistic sample on the page: https://pd4ml.tech/pdf-ua/

It uses available structure and meta info from input HTML/CSS to produce a valid tagged PDF/UA.

If the goal is to pass PDF/UA file format validation only (e.g. by Adobe's Preflight checker) it is sufficient just to choose Constants.PDFUA as an output format.

pd4ml.writePDF(fos, Constants.PDFUA);

If the goal is to produce Matterhorn Protocol-compliant PDFs (and pass a validation by PAC3 https://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html), most probably you would also need to align your input HTML: to add TITLE, ALT and LANG attributes, to make sure table structures and heading hierarchy are consistent etc.

zfr
  • 339
  • 4
  • 11