56

Is there a way to use ghostscript to convert PDF to PDF/A or PDF/X? I know it can be used to convert PDF to images, but I don't know if it can be used to convert PDF/A. What parameters should I use?

imgen
  • 561
  • 1
  • 4
  • 3

5 Answers5

70

This is to convert a pdf document (not pdf/a) into pdf/a: gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf

Hope this will help some one!

Artur
  • 863
  • 6
  • 9
  • That just made my day! Thank u king Arthur! – gabrjan Oct 30 '13 at 08:20
  • 11
    gs outputs that CIE is not recommended; CMYK made some of my pages blue. I simply used `gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf` and it worked. – Sparkler Jun 15 '17 at 20:12
  • 5
    The correct way to set the compatibility policy is `-dPDFACompatibilityPolicy=1` according to this: https://stackoverflow.com/questions/57167784/ghostscript-wont-generate-pdf-a-with-utf16be-text-string-detected-in-docinfo – Tuure Feb 04 '20 at 14:40
  • First I removed ```-dUseCIEColor```. ```gs``` recommends removing it since the release of version 9.11. – LEo Jan 12 '21 at 17:18
  • 2
    I've tried to validate the generated PDF using ```veraPDF``` (PDF/A Conformance Checker). The output find 3 validation errors: 1) ```DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space```. 2) ```An annotation dictionary shall not contain the C array or the IC array unless the colour space of the DestOutputProfile in the PDF/A-1 OutputIntent dictionary, defined in 6.2.2, is RGB``` and 3) ```DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space```. – LEo Jan 12 '21 at 17:18
26

Hope this answer helps others coming from Google with the same problem:

To convert from PDF to PDFA-1b or PDFA-2b, you can use Ghostscript. I suggest you use the latest version (9.19 today).

Install it

**In Mac OS**, you may prefer to use [Homebrew][1]:
brew install ghostscript

(UPDATE: 2023-01-23. This no longer works in mac with homebrew, as versions newer than 9.19 will adamantly refuse to do the conversion, no matter what I've tried)

In Linux, some distros bring a much older version (rhel7 sports 9.07). To download a fully independent modern one-file-only ghostscript, download it directly from the site:

wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs919/ghostscript-9.19-linux-x86_64.tgz

(UPDATE: 2023-01-23: stick to that version, newer versions won't work with the method presented below.

If the link above is broken when you try it 20 years from now, please refer to ghostscript.com and search for download section. Download the binary version, don't go for the source, unless you know what you are doing.

In Windows, I cannot help you, but if you manage to install it, the following commands will also work, if you substitute the location of files and gs executable.

Command line

(note to future editors, please don't remove formatting, as this is more readable, yet working command line)

gs-919-linux_x86_64 \
  -dPDFA=1
  -dNOOUTERSAVE \
  -sProcessColorModel=DeviceRGB \
  -sDEVICE=pdfwrite \
  -dPDFACompatibilityPolicy=1 \
  -o output_file.pdf \
  /path/to/PDFA_def.ps \
  input_file.pdf

In Mac gs-919-linux_x86_64 will be simply gs.

Please note that output_file.pdf and input_file.pdf must be changed to the names of the output file (the converted file) and the input file (the file to be converted). /path/to/PDFA_def.ps is your copy of the file PDFA_def.ps.

-dPDFA=1 is for PDFA-1b.

-dPDFA=2 if you want PDFA-2b.

What is PDFA_def.ps?

PDFA_def.ps is some sort of template ghostscript uses to create a PDFA file. The tricky part is that, for some reason, ghostcript comes with a non-working file.

You'll need to edit PDFA_def.ps and include the path to a valid ICC (color profile) file. Download a good color profile from Adobe:

wget https://tutankhamon.acc.umu.se/mirror/archive/ftp.sunet.se/pub/vendor/adobe/adobe/iccprofiles/win/AdobeICCProfilesWin_end-user.zip

Inside that zip, find a file called AdobeRGB1998.icc, put it somewhere and put the path to that file INSIDE you PDFA_def.ps file. Note that the path should be absolute, with no quotes. Like:

/ICCProfile (/full/path/to/file/AdobeRG1998.icc)   % Customize.

Here is a version of PDFA_def.ps, change PATH_TO_YOUR_ICC_FILE to the path of you AdobeRGB1998.icc.

https://gist.githubusercontent.com/weltonrodrigo/19df77833f023fbe1572168982e4b515/raw/ea86e87379d14120d7ff26f6f235ac7eeb5f5dd5/PDFA_def.ps

motobói
  • 1,687
  • 18
  • 24
  • Thanks. It's very unpleasant that the default PDFA_def.ps comes with a bug wich leads to a confusing file not found error. – Juergen Schulze Jan 12 '18 at 10:43
  • Maybe report this over on [Ghostscript's bug tracker](https://bugs.ghostscript.com/). – Ouroborus Nov 30 '19 at 09:04
  • In windows: gswin64c.exe -dPDFA=1 -dNOOUTERSAVE -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -o output_file.pdf -dPDFACompatibilityPolicy=1 .\inputfile.pdf – DAme Jul 29 '21 at 17:43
  • one up for the version info, this was crucial on a legacy system. – pid Nov 15 '22 at 14:05
  • anyone know how to get PDF/A-1a instead of 1b? – MisterCat Jan 12 '23 at 06:11
  • 1
    gs 9.56.1 spits out this error: "Failed to open the supplied ICCProfile for reading. This may be due to an incorrect filename or a failure to add --permif-file-read= to the command line. This PostScript program needs to open the file and you must explicitly grant it permission to do so. PDF/A processing aborted, output may not be a PDF/A file." (yes, the error message has a typo in the flag, should read: --permit-file-read= according to the docs.) However, adding the icc file in the flag is still not working. – Max N Jan 22 '23 at 00:53
  • 1
    With gs 10: Error: /invalidfileaccess in --file-- Operand stack:--nostringval-- --nostringval-- (AdobeRGB1998.icc) (r) Execution stack:%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1974 1 3 %oparray_pop 1973 1 3 %oparray_pop 1961 1 3 %oparray_pop 1817 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Permission denied – Max N Jan 22 '23 at 18:24
17

Please note that current answers are not completely correct. You can define which level of PDF/A you want, resulting in different behaviors of the program. This one is correct:

gs -dPDFA -dBATCH -dNOPAUSE -sColorConversionStrategy=UseDeviceIndependentColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -sOutputFile=output_filename.pdf input_filename.pdf

Please note my change from sdPDFACompatibilityPolicy to dPDFACompatibilityPolicy. Change it to a higher number to get other versions. 1 is good if you don't need DOCINFO. Furthermore we use the option UseDeviceIndependentColor to avoid validating issues.

If you change options here, you will most likely get a non compliant PDF/A (even if it stated differently). You can check your pdf/a here: https://www.pdf-online.com/osa/validate.aspx

Cyderic
  • 171
  • 1
  • 4
  • 1
    This one, with -sColorConversionStrategy=UseDeviceIndependentColor, worked for me. Thanks very much! – excyberlabber Oct 22 '21 at 19:10
  • Works for me, a PDF/A-1B created and passed tests. Only thing, I would like is to use the Adobe ICC I haven't been able to get that to work working from other solutions. Am I right in thinking that converting to PDF/A-1B removes any embedded JavaScript? – AndrewC Oct 19 '22 at 18:08
  • anyone know how to get PDF/A-1a instead of 1b? – MisterCat Jan 12 '23 at 06:11
  • I'm getting this error in the conversion: The following warnings were encountered at least once while processing this file: bad trailer dictionary **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> pdfTeX <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. – Max N Jan 22 '23 at 01:00
  • This is the first answer that gave me a PDF/A file that passes verapdf verification. Thank you! – Happynoff Jun 25 '23 at 18:41
16

@danio, @imgen: Even recently released documentation pages on PDF/X (standardized Prepress requirements) and PDF/A (standardized Archiving requirments) generation were quite misleading. (Your link pointed to a v8.63 release.) In the end, it suggested that running the example commandlines using the sample PDF*_def.ps would already generated valid PDF/A and PDF/X files.

But, they do not!

Here is one of the sample commands, which by itself is correct:

  gs \
    -dPDFA \
    -dBATCH \
    -dNOPAUSE \
    -dNOOUTERSAVE \
    -dUseCIEColor \
    -sDEVICE=pdfwrite \
    -sOutputFile=out-a.pdf \
     PDFA_def.ps \
     input.ps

The output file will declare itself to be PDF/A (and most PDF viewers would happily go along with this), but the output file fails all real compliance tests.

The fix is easy: you need to edit your sample PDFA_def.ps (for PDF/X: your PDFX_def.ps) files to match your environments. These required edits were not clearly spelled out in older documentation versions, and the provided command suggested it would work out of the box.

Especially in case of PDF/X you MUST specifiy a valid ICC profile to use.

See also the updated documentation (current SVN trunk version) about this:

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Can you please care to answer a similar post? Thanks in advance. http://stackoverflow.com/q/28632119/1288722 – doctorate Feb 20 '15 at 15:20
  • @doctorate: Sorry, I don't have enough time right now to go into the details over there. Current Ghostscript should be able to generate PDF/X-***3*** -- though not PDF/X-***1***. If your print shop cannot accept PDF/X-***3*** they use very outdated technology. X-3 is much more modern, and more reliable for what it does than is X-1... – Kurt Pfeifle Feb 20 '15 at 16:43
  • for me it's the last-minute solution to get away with it. They are online printing company, I think PDF/X-3 will solve it, but that only relevant once you have the time to answer. Thanks again. – doctorate Feb 20 '15 at 16:47
  • the following error shows up: ```Error: /invalidfileaccess in --file-- Operand stack: --nostringval-- --nostringval-- (srgb.icc) (r)``` see full output here: https://pastebin.com/wpGyGXkU – LEo Jan 12 '21 at 17:23
  • @LEo: You didn't specify a valid ICC profile, or the one you specified (srgb.icc) is not in the path where it is meant to be. Look what your *PDFA_def.ps* defines as the *srgb.icc*. – Kurt Pfeifle Jan 12 '21 at 18:55
  • @KurtPfeifle I've made a copy of it in the local folder ```$ cp /usr/share/color/icc/ghostscript/srgb.icc .``` and it did not work. – LEo Jan 12 '21 at 18:59
  • @LEo: better change it to the absolute path in *PDFA_def.ps*. – Kurt Pfeifle Jan 12 '21 at 19:05
  • @KurtPfeifle Same result. – LEo Jan 12 '21 at 20:15
  • Sorry, no time to look more intensely. You'll have to read about the debugging flags you can pass on Ghostscript's command line and use them. Also, install the latest version (9.53.3) and test again. Make sure to read the documentation matching this version: https://ghostscript.com/doc/9.53.3/VectorDevices.htm#PDFA – Kurt Pfeifle Jan 13 '21 at 00:35
  • I am having the same issue Ghostscript will not find the ICC despite the absolute path and ICC profile being valid in my PDFA_def.ps. I use macOS and system ICCs are in the same place this includes the standard Adobe ICC /System/Library/ColorSync/Profiles/AdobeRGB1998.icc which I am using. I am using Ghostscript 10.0.0 (2022-09-21). I have also tried the original AdobeRGB1998 downloaded from adobe. – AndrewC Oct 19 '22 at 16:12
  • the error is https://pastebin.com/k2tvbapH – AndrewC Oct 19 '22 at 16:19
-2

If you're using Windows and want to create PDF/A-1b documents explicitely (PDFCreator has an output option for PDF/A-2b but not for PDF/A-1b), you just can enter the parameters Artur described above into the ui settings of PDFCreator without the ones for the document names. Start PDFCreator, choose the printer menu, then go to settings. Now, choose 'Ghostscript' from the settings list on the left side. Under 'additional ghostscript settings', enter as follows :

-dPDFA|-dBATCH|-dNOPAUSE|-dUseCIEColor|-sProcessColorModel=DeviceCMYK|-sDEVICE=pdfwrite|-sPDFACompatibilityPolicy=1

Click on 'Save', then print something from MS Word or any other application you want using the PDFCreator - it will be created in PDF/A-1b.

Greetings, Fritz

  • How to invoke ghostscript from windows..none of the command mentioned above works. I have tried two options: 1. go to the path of ghostscript ( C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Ghostscript> ) and double click on ghostscript and after that try type the "gs -d..." commands as mentioned above, but it is not working. 2. I have opened the command prompt and tries to invoke ghostscript ( no idea how to invoke it , whether to use gs or type ghostscript etc ) The below is the commands I have tried on ghostscript : C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Ghostscript> – Shubhra Garg Oct 15 '18 at 08:13