4

I need to convert a pdf in grayscale if it does contain colors. For this purpose i found a script which can determine if the pdf is already in grayscale or not.

convert "source.pdf" -colorspace RGB -unique-colors txt:- 2> /dev/null \
   | egrep -m 2 -v "#([0-9|A-F][0-9|A-F])\1{3}" \
   | wc -l

This counts how many colors with different values of RGB (so they are not gray) are present in the document.

If the pdf is not already a grayscale document i proceed with the conversion with ghostscript

gs \
  -sOutputFile=temp.pdf \
  -sDEVICE=pdfwrite \
  -sColorConversionStrategy=Gray \
  -dProcessColorModel=/DeviceGray \
  -dCompatibilityLevel=1.4 \
  -dNOPAUSE \
  -dBATCH \
   source.pdf < /dev/null

If i open the output document with a PDF viewer it shows without colors correctly. But if i try the first script on the new generated document it turns out that it still does contain some colors. How can i convert a document to precise grayscale? I need this because if i print this document with a color printer, the printer will use colors and not black to print gray.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Can you explain what the intention of your `egrep` statement is? – Kurt Pfeifle May 07 '12 at 06:13
  • I used many times this ghostscript switches to convert RGB colorspace to GRAYSCALE and then checked with Callas preflight software. Don't trust to imagemagick, these switches work without pain – Dingo May 07 '12 at 12:20

2 Answers2

9

I value ImageMagick in general very much -- but don't trust convert to count the colors correctly with the command you're using...

May I suggest a different method to discover if a PDF page uses color? It is based on a (relatively new) Ghostscript device called inkcov (you need Ghostscript v9.05 or newer). It displays the ink coverage of CMYK for each single page (for RGB colors, it does a silent conversion to CMYK internally).

First, generate an example PDF with the help of Ghostscript:

gs \
  -o test.pdf \
  -sDEVICE=pdfwrite \
  -g5950x2105 \
  -c "/F1 {100 100 moveto /Helvetica findfont 42 scalefont setfont} def" \
  -c "F1                         (100% 'pure' black)   show showpage" \
  -c "F1 .5 .5 .5   setrgbcolor  (50% 'rich' rgbgray)  show showpage" \
  -c "F1 .5 .5 .5 0 setcmykcolor (50% 'rich' cmykgray) show showpage" \
  -c "F1 .5         setgray      (50% 'pure' gray)     show showpage"

While all the pages do appear to the human eye to not use any color at all, pages 2 and 3 do indeed mix their apparent gray values from color.

Now check each page's ink coverage:

gs  -o - -sDEVICE=inkcov test.pdf 
 [...]
 Page 1
 0.00000  0.00000  0.00000  0.02230 CMYK OK
 Page 2
 0.02360  0.02360  0.02360  0.02360 CMYK OK
 Page 3
 0.02525  0.02525  0.02525  0.00000 CMYK OK
 Page 4
 0.00000  0.00000  0.00000  0.01982 CMYK OK

(A value of 1.00000 maps to 100% ink coverage for the respective color channel. So 0.02230 in the first line of the result means 2.23 % of the page area is covered by black ink.) Hence the result given by Ghostscript's inkcov is exactly the expected one:

  • pages 1 + 4 don't use any of C (cyan), M (magenta), Y (yellow) colors, but only K (black).
  • pages 2 + 3 do use ink of C (cyan), M (magenta), Y (yellow) colors, but no K (black) at all.

Now let's convert all pages of the original PDF to use the DeviceGray colorspace:

gs \
 -o temp.pdf \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -sProcessColorModel=DeviceGray \
  test.pdf

...and check for the ink coverage again:

gs -q  -o - -sDEVICE=inkcov temp.pdf
 0.00000  0.00000  0.00000  0.02230 CMYK OK
 0.00000  0.00000  0.00000  0.02360 CMYK OK
 0.00000  0.00000  0.00000  0.02525 CMYK OK
 0.00000  0.00000  0.00000  0.01982 CMYK OK

Again, exactly the expected result in case of succesful color conversions! (BTW, your convert command returns 2 for me for both files, the [original] test.pdf as well as the [gray-converted] temp.pdf -- so this command cannot be right...)

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
6

Maybe your document contains transparent figures. Try passing option

-dHaveTransparency=false

to your ghostscript conversion command. The full list of options for the pdfwrite device can be found at http://ghostscript.com/doc/current/Ps2pdf.htm#Options

  • 2
    @KurtPfeifle I don't fully agree with you. Although I know that this question has not been active for months, and that your answer was fully adequate, I ended up on this question because my own problem was its exact title. After conversion, a few pages still contained color. Thanks to your answer, I was able to determine which pages, and narrow the problem, to the fact that some figures contained transparency, which resulted in color pages. Passing the option I provided solved this problem. Therefore I posted this answer where I would have been pleased to find it. – Vincent Nivoliers Oct 29 '12 at 21:21
  • @VincentNivoliers: Ok, you may be right. I'll try + test this with an appropriate input sample as soon as I've the time. Thanks for this additional hint. – Kurt Pfeifle Oct 29 '12 at 21:53
  • 1
    For a minimal example, a solution is to create a minimal pdf using latex, and including a png with transparency layer (even is no pixel is transparent). Converting to grayscales without the above option results in an ink coverage (measured using your tools) showing color. – Vincent Nivoliers Oct 30 '12 at 18:00