0

I have a Python 3 script that I am writing to do three things:

1) Determine which Retrosheets data files are supposed to be downloaded 2) Create wget commands to retrieve the files and download them 3) Unzip the files after they have been downloaded.

When testing each function in the Python Console, I have no problems. But, when I try to do everything automatically, I get the following output:

    Start Decade: 1930
    End Decade: 1950
    Creating wget commands...
    Commands created...
    Downloaded 3 files.
    Unzipping files...
    Traceback (most recent call last):
      File "import_pbp.py", line 54, in <module>
        unzip_data(decade_files)
      File "import_pbp.py", line 39, in unzip_data
        with zipfile.ZipFile('zip' + file, 'r') as zip_ref:
      File      "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5 /lib/python3.5/zipfile.py", line 1009, in __init__
    self.fp = io.open(file, filemode)
    FileNotFoundError: [Errno 2] No such file or directory: 'zip1930seve.zip'

The files are downloaded after this output to the console. This would seem to indicate that the unzip function is running before the files are downloaded. How do I make sure that my files are downloaded before the unzip function is called? Code below:

Download function:

# define function to execute the download commands
def download_data(commands):
    for command in commands:
        os.popen(command)
    print('Downloaded ' + str(len(commands)) + ' files.')

Unzip Function:

# Unzip the data files into the 'unparsed' folder.
def unzip_data(file_list):
    print('Unzipping files...')
    for file in file_list:
        with zipfile.ZipFile('zip' + file, 'r') as zip_ref:
            zip_ref.extractall('unparsed/')
        print(file + ' unzipped')
    print('All files unzipped...')

EDIT: I looked at the response in this thread but it didn't quite explain what I needed like tdelaney did below. They are similar, but for my purposes, different. Especially since that question is 6 years old and I'm guessing there may have been significant changes to the language since then.

EDIT 2: Removed non-essential code to shorten the post.

Community
  • 1
  • 1
JJAJ
  • 13
  • 6
  • Possible duplicate of [Python popen command. Wait until the command is finished](http://stackoverflow.com/questions/2837214/python-popen-command-wait-until-the-command-is-finished) – glibdud Jan 10 '17 at 18:01
  • The `os.popen()` command does not wait until it is finished to return. – glibdud Jan 10 '17 at 18:02

1 Answers1

1

os.popen doesn't wait for the process to complete so you launch all of the commands at once then try the unzips before they are done. Since you don't read the stdout pipe returned from os.popen, you also risk the program hanging if the output pipe fills.

The subprocess module has several functions for calling programs. Assuming you really do want all of the commands to run in parallel and that you just want to discard any output data from the commands, you could reimplement that function as

import subprocess as subp
import os

# define function to execute the download commands and unzip
def download_data(commands):
    procs = []
    for command in commands:
        procs.append(subp.Popen(command, shell=True, 
            stdout=open(os.devnull, 'wb')))
    for proc in procs:
        proc.wait()
    print('Downloaded ' + str(len(commands)) + ' files.')
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Thanks! This works precisely for my needs. Why do you need the second for loop? Does that just check to make sure that each subprocess is complete after the command is executed in the for loop above it? – JJAJ Jan 10 '17 at 19:16
  • 1
    Processes remain in the process table until their exit status is read... zombie processes. You need the call to wait for the processes to complete or you'd be back unzipping files that don't exist, but also to avoid your own personal zombie apocalypse on your system. – tdelaney Jan 10 '17 at 19:38
  • You could rework the code to start the unzipping earlier. perhaps one thread per subprocess and a queue. Each worker would run 1 command, wait for completion and put the .zip file on a queue for a listener to consume. – tdelaney Jan 10 '17 at 19:40
  • Thanks for the extra explanation...this helps a ton! Also gives me some ideas for improving the code later. Cheers! – JJAJ Jan 11 '17 at 14:07