0

I am compressing my pdf file using ghostscript which throws error on password protected case which I have to handle.

Shell script

GS_RES=`gs -sDEVICE=pdfwrite -sOutputFile=$gsoutputfile -dNOPAUSE -dBATCH $2 2>&1`

if [ "$GS_RES" != "" ]
then
    gspassmsg="This file requires a password for access"
    echo "Error message is :::::: "$GS_RES
    gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`
    if [ $gspassworddoc -ne 0 ]
    then
        exit 3 #error code - password protected pdf
    fi
fi

And my GS_RES value after executing the command is like the following

Error message 1:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All 
rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for d
etails. Error: /syntaxerror in -file- Operand stack: Execution stack: %interp_ex
it .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --n
ostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1967 1 3 %opa
rray_pop 1966 1 3 %oparray_pop 1950 1 3 %oparray_pop 1836 1 3 %oparray_pop --nos
tringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringva
l-- 2 %stopped_push Dictionary stack: --dict:1196/1684(ro)(G)-- --dict:0/20(G)--
 --dict:78/200(L)-- Current allocation mode is local Current file position is 1

Error message 2:

GPL Ghostscript 9.19 (2016-03-23) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Cannot find a 'startxref' anywhere in the file. Output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: An error occurred while reading an XREF table. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The file has been damaged. This may have been caused gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html by a problem while converting or transfering the file. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Ghostscript will attempt to recover the data. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html However, the output may be incorrect. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Error: Trailer dictionary not found. Output may be incorrect. No pages will be processed (FirstPage > LastPage). gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html This file had errors that were repaired or ignored. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html Please notify the author of the software that produced this gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html file that it does not conform to Adobe's published PDF gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html specification. gs.pdf gsempty.pdf new_sathishks_protected.html sathishks_protected.html The rendered output from this file may be incorrect.

On running awk on Error message 2

gspassmsg="This file requires a password for access"
gspassworddoc=`awk -v a="$GS_RES" -v b="$gspassmsg" 'BEGIN{print index(a,b)}'`

It throws me the following error

Error : awk: newline in string GPL Ghostscript 9.19... at source line 1

Error message 3

   **** Error: Cannot find a 'startxref' anywhere in the file.
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
   **** Error:  Trailer is not found.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

I couldn't capture this error with the snippet from the below answer

if ! gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null); then
  echo "Error message is :::::: $gs_res" >&2
  gspassmsg='This file requires a password for access'
  [[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
  echo "Some other error !"
fi

Please clarify me the following

  1. Why awk behaves weird here? What I'm missing?
  2. How to grep for a pattern in a string which contains special characters?
  3. Does Ghostscript has any predefined error messages like that? If possible please suggest some documentation to refer..
  4. Is it possible to compress password protected pdf with ghostscript?
  5. How can i ensure for gs compression success in the above case? Since I may not know about different possible error which Ghostscript may throw so that i could cross check with my executed command result.

I am quite new to this shell script. Someone please help me on this.

PS: I have edited my question with additional details. Please look into it. If something has to be added i'll add it.

Tom Taylor
  • 3,344
  • 2
  • 38
  • 63
  • @mklement0 searching for a string ( Eg; "This file requires a passoword for access") from the command output - have referred this process as `grep`. – Tom Taylor Nov 21 '16 at 04:58
  • And i'm not sure i guess `awk` plays wired when my command output contains some special characters in it. – Tom Taylor Nov 21 '16 at 04:59
  • The special characters is like (`, ") in the command output - which i have added in Error message 2. Please make a look into it. – Tom Taylor Nov 21 '16 at 05:00
  • Neither of the error messages you quoted here are due to the file being password protected, in both cases the file is broken. So badly broken in fact that Ghostscript cannot process them. You cannot 'ensure .... success' for these files, they are defunct, broken, invalid, totally screwed etc. NB it 'looks like' from the messages you are trying to process a HTML file, that's **really** not going to work. Because you are redirecting stderr to stdout you're also getting the two interleaved, which is hard to read. Why use 'awk' and not 'grep' ? Awk seems like overkill to me. – KenS Nov 21 '16 at 10:09
  • @KenS I am trying to compress my file. After I receive the command output from GS i should be able to handle i) failure due to password protection ii) failure due to some other kind of error like the above iii) compression success case. Please help me to handle this. – Tom Taylor Nov 21 '16 at 11:50
  • I have no idea how you expect to 'handle' the case where your PDF file is so broken that Ghostscript can't even read it! However, you should check the return code from Ghostscript, if its not 0, then you can check the error log, I'd use grep myself because its simpler. If you find "This file requires a password" then you can assume that its password protected. If you didn't find that, then its broken. Lastly, Ghostscript does not compress PDF files, as I stated in my answer. – KenS Nov 21 '16 at 13:21
  • @KenS: Leaving the compression aspect aside, I think the error messages are just examples, and the OP is simply looking for a robust error handling mechanism that allows distinguishing between failure due to password protection and other failures. The code breaks, because BSD `awk` (which I infer is being used) cannot handle multi-line strings with `index()`, but that's easily bypassed by using Bash's native string-matching features. – mklement0 Nov 21 '16 at 15:09
  • 1
    You are right @mklement0. Thats sounds good !! Super :+1: – Tom Taylor Nov 21 '16 at 16:13
  • 1
    @subramanianrasapan: Glad to hear it. I've since realized that it's not just `index()`: BSD Awk fails fundamentally when you try to pass a multiline string as a variable value, unless you `\ `-escape the newlines. Details and an alternative solution are in my answer. – mklement0 Nov 21 '16 at 16:17

2 Answers2

2

Ghostscript's error messages all follow the same pattern, however there are some gotchas:

Part of the output is a dump of the operand stack at the time of the error. Since PostScript is a programming language, the contents of the stack depends on the program, and is entirely unpredictable. Even though you are dealing with PDF files, not PostScript programs, the interpreter is itself written in PostScript, so the same still applies.

The

'Error: /syntaxerror...'

is limited to a small number of actual possible errors, the PostScript Language Reference Manual defines them.

PostScript (but not PDF) programs can install an error handler, which can totally alter the error output, and even swallow the error altogether.

As regards 'compressing PDF files', that is absolutely not what you are doing. Please have a read here which explains what's actually happening. In short though, you are producing a new PDF file, not compressing an old one.

You can, of course, process a password protected PDF file with Ghostscript, as long as you know the password. Look for PDFPassword in the documentation here

Now the error message you quote above is not due to the file being encrypted (password protected), there's something else wrong with it. In fact given the simple command line you are using, I'd say there's something quite seriously wrong with it. Of course without seeing the file I can't tell for certain.

Now if a file is encrypted, the output from Ghostscript should read something like:

GPL Ghostscript GIT PRERELEASE 9.21 (2016-09-14) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details.

**** This file requires a password for access.

Error: /invalidfileaccess in pdf_process_Encrypt

Operand stack:

Execution stack: %interp_exit .runexec2 --nostringval--
--nostringval-- --nostringval- - 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa lse 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_ pop 1966 1 3
%oparray_pop --nostringval-- --nostringval-- --nostri ngval--
--nostringval-- false 1 %stopped_push Dictionary stack: --dict:1199/1684(ro)(G)-- --dict:1/20(G)-- --dict:83/200(L)-- --dict:83 /200(L)-- --dict:135/256(ro)(G)-- --dict:291/300(ro)(G)-- --dict:26/32(L)- - Current allocation mode is local GPL Ghostscript GIT PRERELEASE 9.21: Unrecoverable error, exit code 1

So simply grepping for "This file requires a password" should be enough to identify encrypted files.

Now, as noted by mklement0, if you'd like to explain what it is about your actual script which is causing a problem, perhaps we can help with that too. You haven't shown the output of your script, or explained what is not working as you expect.

mklement0
  • 382,024
  • 64
  • 607
  • 775
KenS
  • 30,202
  • 3
  • 34
  • 51
2

KenS's helpful answer addresses your questions about Ghostscript itself.
Here's a streamlined version of your code that should work:

# Run `gs` and capture its stderr output.
gs_res=$(gs -sDEVICE=pdfwrite -sOutputFile="$gsoutputfile" -dNOPAUSE -dBATCH "$2" 2>&1 1>/dev/null)
ec=$? # Save gs's exit code.

# Assume that something went wrong, IF:
#   - gs reported a nonzero exit code
#   - but *also* if any stderr output was produced, as
#     not all problems may be reflected in a nonzero exit code.
if [[ $ec -ne 0 || -n $gs_res ]]; then
  echo "Error message is :::::: $gs_res" >&2
  gspassmsg='This file requires a password for access'
  [[ $gs_res == *"$gspassmsg"* ]] && exit 3 # password protected pdf
fi
  • I've double-quoted the variable and parameter references in your gs command.

  • I've changed your redirection from just 2>&1 to 2>&1 1>/dev/null so as to only capture stderr output.

    • 2>&1 redirects stderr (2) to the (still-original) stdout (1), so that error messages are sent to stdout and can be captured as part of the command substitution ($(...)); 1>/dev/null then redirects stdout to the null device, effectively silencing all stdout output. Note that the earlier redirection of stderr to the original stdout is not affected by this, so in effect what the overall command sends to stdout is the original stderr output only.
      If you want to know more, see this answer of mine.
  • I'm using the more modern and flexible $(..) command-substitution syntax instead of the legacy `...` form (for background information, see here).

  • I've renamed GS_RES to gs_res, because it is better not to use all-uppercase shell-variable names in order to avoid conflicts with environment variables and special shell variables.

  • I'm using simple pattern matching to find the desired substring in gs's stderr output. Given that you already have the input to test against in a variable, Bash's own string-matching features will do (which are actually quite varied), and there is no need to use an external utility such as awk.


As for why your awk command failed:

It sounds like you're using BSD awk, such as the one that comes with macOS as of 10.12 (your question is tagged linux, however):

BSD awk doesn't support newlines in variable values passed via -v unless you \-escape the newlines.
With unescaped multi-line strings, your awk call fails fundamentally, before index() is ever called.

By contrast, GNU Awk and Mawk do support multi-line strings as-is passed via -v.

Read on for optional background information.


To determine which awk implementation you're using, run awk --version and examine the output:

  • awk version 20070501 -> BSD Awk

  • GNU Awk 4.1.3, API: 1.1 ... -> GNU Awk

  • mawk: not an option: --version -> Mawk

Here's a simple test to try with your Awk version:

awk -v a=$'1\n2' -v b=2 'BEGIN { print index(a, b) }'

Gnu Awk and Mawk output 3, as expected, whereas BSD Awk fails with awk: newline in string 1.

Also note that \-escaping newlines works ONLY in BSD Awk (e.g.,
awk -v var=$'1\\\n2' 'BEGIN { print var }'), which unfortunately means that there is no portable way to pass multi-line variable values to Awk.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks for your time @mklement0. If I am not asking more can you please explain me a bit on what `2>&1 1>/dev/null` does? I do not have idea on this. Please help me to know about it. – Tom Taylor Nov 21 '16 at 16:44
  • Also I use this `awk` in both linux and centos. – Tom Taylor Nov 21 '16 at 16:50
  • @subramanianrasapan: Re `2>&1 1>/dev/null`: please see my update. – mklement0 Nov 21 '16 at 16:51
  • @subramanianrasapan: Which of the `awk` implementations I mentioned? What do you get when you run `awk --version`? Also remember that using `awk` to solve your problem is not necessary, as shown in my revised version of your code. – mklement0 Nov 21 '16 at 16:57
  • My 'awk' version returns as GNU Awk 4.0.2 – Tom Taylor Nov 21 '16 at 17:47
  • @subramanianrasapan: OK. Note that your Awk version shouldn't exhibit the multi-line problem. Does `awk -v a=$'1\n2' '{print a}'` work? If you're still having problems, please update the 2nd error message in your question by posting it verbatim (exactly as it is output, including newlines) using a _single, indented code block_ rather than backticks. – mklement0 Nov 21 '16 at 19:15
  • When I have tried the above with the above changes this fails to capture the error message which i have added as `Error Message 3` in my question. Please look into it. – Tom Taylor Nov 22 '16 at 06:26
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/128710/discussion-between-subramanian-rasapan-and-mklement0). – Tom Taylor Nov 22 '16 at 12:00