Python Uudecode Call Corruption

Question

I am working on extracting PDFs from SEC filings. They usually come like this:

For whatever reason when I save the raw PDF to a .text file, and then try to run

uudecode -o output_file.pdf input_file.txt

from the python subprocess.call() function or any other python function that allows commands to be executed from the command line, the PDF files that are generated are corrupted. If I run this same command from the command line directly there is no corruption.

When taking a closer look at the PDF file being output from the python script, it looks like the file ends prematurely. Is there some sort of output limit when executing a command line command from python?

Thanks!

what happens if you run: `python -muu -d input_file.txt output_file.pdf` from the command line? — jfs, Jul 07 '15 at 09:29
@J.F.Sebastian When I ran that in the command line it worked. But as soon as I put it into my code like this : `subprocess.call([ "python", "-muu", "-d", input_file.txt, output_file.pdf])` I have the same issue — Alexa Gottacatchemall Halcomb, Jul 07 '15 at 14:33
don't run it as a subprocess inside your Python script, you could import it instead: [`import uu; uu.decode('input_file.txt', 'output_file.pdf')`](https://docs.python.org/3/library/uu.html#uu.decode) (Note: the quotes around the filenames are not optional; they create a string object in Python). Or (better) pass open binary file objects e.g., created using `input_file = open('input_file.txt', 'rb'); output_file = open('output_file.pdf', 'wb')` — jfs, Jul 07 '15 at 18:59

score 1 · Accepted Answer · answered Jul 06 '15 at 19:48

This script worked fine for me running under Python 3.4.1 on Fedora 21 x86_64 with uudecode 4.15.2:

import subprocess
subprocess.call("uudecode -o output_file.pdf input_file.txt", shell=True)

Using the linked SEC filing (length: 173,141 B; sha1: e4f7fa2cbb3422411c2f2968d954d6bb9808b884), the decoded PDF (length: 124,557 B; sha1: 1676320e1d9923e14d19451c16688198bc93ca0d) appears correct when viewed.

There may be something else in your environment causing the problem. You may want to add additional details to your question.

Is there some sort of output limit when executing a command line command from python?

If by "output limit" you mean the size of the file being written by uudecode, then no. The only type of "output limit" you need to worry about when using the subprocess module is when you pass stdout=PIPE or stderr=PIPE when creating a child process. If the child process writes enough data to either of these streams, and your script does not regularly drain them, the child process will block (see the subprocess module documentation). In my test, uudecode wrote nothing to stdout or stderr.

Thank you for your response. I am using Python 2.7.3, Ubuntu precise (12.04.5 LTS), uudecode 4.11. I may try updating to uudecode 4.15 to see if that helps my issue. edit: **I dont think python 2.7 can update to uudecode 4.1** — Alexa Gottacatchemall Halcomb, Jul 06 '15 at 21:51
Can you check the return value of `subprocess.call()` in your script? — Rusty Shackleford, Jul 07 '15 at 15:02

Python Uudecode Call Corruption

1 Answers1