0

I am trying to run pdftotext using python subprocess module.

import subprocess

pdf = r"path\to\file.pdf"
txt = r"path\to\out.txt"
pdftotext = r"path\to\pdftotext.exe"

cmd = [pdftotext, pdf, txt, '-enc UTF-8']
response = subprocess.check_output(cmd, 
                shell=True,
                stderr=subprocess.STDOUT)

TB

CalledProcessError: Command '['path\\to\\pdftotext.exe',
'path\\to\\file.pdf', 'path\\to\\out.txt', '-enc UTF-8']'
returned non-zero exit status 99

When I remove last argument '-enc UTF-8' from cmd, it works OK in python.

When I run pdftotext pdf txt -enc UTF-8 in cmd, it works ok.

What I am missing?

Thanks.

Rahul
  • 10,830
  • 4
  • 53
  • 88

1 Answers1

1

subprocess has some complicated rules for handling commands. From the docs:

The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.

More details explained in this answer here.

So, as the docs explain, you should convert your command to a string:

cmd = r"""{} "{}" "{}" -enc UTF-8""".format('pdftotext', pdf, txt) 

Now, call subprocess as:

subprocess.call(cmd, shell=True, stderr=subprocess.STDOUT)
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I Tried `cmd = r"""{} "{}" "{}" -enc UTF-8""".format(pdftotext, pdf, txt)` as there were spaces in my path and the files are generated dynamically. – Rahul Jul 28 '17 at 08:59