4

When creating a simple graphic via Cairo,

> library("Cairo")
> pdf(file="pdf1.pdf")
> plot( 1:10, 1:10, type="b")
> dev.off()
null device 
          1 
> library("Cairo")
> CairoPDF(file="pdf1.pdf")
> plot( 1:10, 1:10, type="b")
> dev.off()
null device 
          1 

veraPDF reports

$ verapdf --verbose --format text pdf1.pdf    
FAIL ./pdf1.pdf
  FAIL 6.1.3-1
  FAIL 6.7.3-6
  FAIL 6.3.4-1
  FAIL 6.7.2-1
  FAIL 6.7.3-2
  FAIL 6.7.3-7
  FAIL 6.1.7-2

$ verapdf --verbose --format text cairo1.pdf
FAIL ./cairo1.pdf
  FAIL 6.1.3-1
  FAIL 6.7.2-1
  FAIL 6.2.3.3-1
  FAIL 6.4-3
  FAIL 6.7.3-7

Which are references to the failed requirements in https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules. (however, veraPDF is known to have occasional bugs, too.)

In this case, I wonder if R now has an output device that produces pdf/a compliant graphics. (I noted before that all pdf files including R Cairo graphics thus lose their own pdf/a compliance.) how do other people maintain pdf/a compliant files using R graphics?


update 1: `ggsave`` does not do it, either.


update 2: kdp in particular fails content that includes transparencies. a bad crutch for Cairo output is gs -o $out -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 $in, which can fix the transparency problem [6.4-3], but adds new problems: 6.3.7-3, 6.1.8-1, 6.1.7-2; plus it explodes file sizes as it presumably rasterizes the format. An option to completely suppress transparencies in CairoPDF would go a long way. the pdf() device output does not add transparencies, but then requires different font and other handlings...sigh.


update 3 (best solution so far):

# gs -dPDFA -dNOPAUSE -dBATCH -sOutputFile="out.pdf" \
-sDEVICE=pdfwrite -dCompatibilityLevel=1.3 in.pdf

gets rid of most of the problems, leaving only the 6.2.3.3-1 complaint. unfortunately, some files look great, others seem rasterized and much bigger. not sure yet why.

ivo Welch
  • 2,427
  • 2
  • 23
  • 31

1 Answers1

1

For PDF/A-1B compliance

The minimal requirements above and beyond your baseline PDF are an imbedded plain text XMP table (6.7.#-#) and imbedded colour strategy, both drastically increase a small file. However you can trim down their content to reduce collateral damage, as many XMP blocks are often treble what's essentially required.

Both these can be added with

"gs or gswin64.exe" -sDEVICE=pdfwrite -dPDFA -dPDFACompatibilityPolicy=1 -sProcessColorModel=DeviceRGB -sColorConversionStrategy=UseDeviceIndependentColor -o out.pdf in.pdf 

other options for quality can be added these are just for the PDF/A requirements. Generally these settings should not affect vectors or images much but you need to check colours and scalars are not affected beyond expectations. If so you may need to alter by adding other switches or colour profiles.

Part of XMP

<rdf:Description rdf:about="" xmlns:pdfaid='http://www.aiim.org/pdfa/ns/id/' pdfaid:part='1' pdfaid:conformance='B'/></rdf:RDF>  

or alternate method (overall more compact text)

8 0 obj
<</Type/Metadata/Length 677/Subtype/XML>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="" xmp:CreateDate="2023-04-15T02:36:42-04:00" xmp:ModifyDate="2023-04-16T00:15:25-04:00" xmlns:xmp="http://ns.adobe.com/xap/1.0/"><xmp:MetadataDate>2023-04-16T00:15:25-04:00</xmp:MetadataDate></rdf:Description><rdf:Description rdf:about="" pdf:Producer="cairo 1.16.0 (https://cairographics.org)" xmlns:pdf="http://ns.adobe.com/pdf/1.3/"/><rdf:Description rdf:about="" pdfaid:conformance="B" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" pdfaid:part="1"/></rdf:RDF></x:xmpmeta><?xpacket end="w"?>
endstream
endobj

ADDs any shortages or conflicts such as may need a correct ID e.g. /ID [<4E53C48E52ED277DD2691B3D2D184C8B><4E53C48E52ED277DD2691B3D2D184C8B>] and resolves stream length syntax errors, and as required other changes like set /ColorSpace 10 0 R plus will remove any Cairo /S(oftmask) transparency as might be added by using PNG sources. e.g. << /ExtGState << /a0 << /ca 1 /CA 1 >>

The result will be valid as PDF/A-1B enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36
  • 1
    does the gs pdfwrite device rasterize the vector file or just add the XMP? (PS: answer is unclear grammatically after word "ADDs". do you mean the gs fixes this, or it needs to be added oneself?) – ivo Welch Apr 17 '23 at 06:58