2

Why is it that calling an executable via subprocess.call gives different results to subprocess.run?

The output of the call method is perfect - all new lines removed, formatting of the document is exactly right, '-' characters, bullets and tables are handled perfectly.

Running exactly the same function with the run method however and reading the output from stdout completely throws the output. Full of '\n', 'Â\xad', '\x97', '\x8f' characters with spacing all over the place.

Here's the code I'm using:

Subprocess.CALL

result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'])

Subprocess.RUN

result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','-'],stdout=PIPE, stderr=PIPE, universal_newlines=True, encoding='utf-8')

I don't understand why the run method doesn't parse and display the file in the same way. I'd use call however I need to save the result of the pdftotext conversion to a variable (in the case of run: var = result.stdout).

I can go through and just identify all the unicode it's not picking up in run and strip it out but I figure there must just be some encoding / decoding settings that the run method changes.

EDIT

Having read a similarly worded question - I believe this is different in scope as I'm wanting to understand why the output is different.

lawson
  • 377
  • 1
  • 13
  • Possible duplicate of [What's the difference between Python's subprocess.call and subprocess.run](https://stackoverflow.com/questions/40697583/whats-the-difference-between-pythons-subprocess-call-and-subprocess-run) – Valentino Jan 31 '19 at 14:14
  • 1
    Why do you set universal_newlines to True when you use run ? You let it to its default value (aka None) with call, which might explain the output difference – Viper Jan 31 '19 at 14:17
  • @Viper - I should have mentioned I've tried adding these options to see if this preserves formatting. I've run it with just the stdout=PIPE but the only real difference is that it clears up a bit more of the unicode characters with universal_newlines=True enabled. So better with but still not giving the same output as with call. – lawson Jan 31 '19 at 14:21

1 Answers1

1

I've made some tests.

Are you printing the content on the console? Try to send the text in a text file with subprocess in both cases and see if it is different:

result=subprocess.call(['/path_to_pdftotext','-layout','/path_to_file.pdf','test.txt'])

result=subprocess.run(['/path_to_pdftotext','-layout','/path_to_file.pdf','test2.txt'])

and compare test.txt and test2.txt. In my case they are identical.

I suspect that the difference you are experiencing is not strictly related to subprocess, but how the console represent the output in both cases.

As said in the answer I linked in the comments, call():

It is equivalent to: run(...).returncode (except that the input and check parameters are not supported)

That is your result stores an integer (the returncode) and the output is printed in the console, which seems to show it with the correct encoding, newlines etc.

With run() the result is a CompletedProcess instance. The CompletedProcess.stdout argument is:

Captured stdout from the child process. A bytes sequence, or a string if run() was called with an encoding or errors. None if stdout was not captured.

So being a bytes sequence or a string, python represents it differently when printed on the console, showing all the stuffs '\n', 'Â\xad', '\x97', '\x8f' and so on.

Valentino
  • 7,291
  • 6
  • 18
  • 34
  • That's a great answer, thanks @Valentino. As you suggest, I was outputting to screen with '-' option enabled so hadn't been saving to file. Both run and call when saved to file produce exactly the same document. I appreciate the description as to why this happens in the console being due to the way python represents this as a CompletedProcess. As for my use case, I need to fiddle around with the processed text in a variable so I'll need to strip out the characters in the console but at least I know why this was showing different results. Thanks again! – lawson Jan 31 '19 at 16:16
  • I didn't go so far to test this, but if you write the `CompletedProcess.stdout` string to a text file with the proper encoding, you should obtain again the correct text. I see no reason why it should not work. I don't know what is your use case, but if it involves to save the text again in a text file, maybe keeping this in mind can save you some stripping. – Valentino Jan 31 '19 at 16:26