0

I just found this great wget wrapper and I'd like to rewrite it as a python script using the subprocess module. However it turns out to be quite tricky giving me all sorts of errors.

download()
{
    local url=$1
    echo -n "    "
    wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
    sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'

    echo -ne "\b\b\b\b"
    echo " DONE"
}

Then it can be called like this:

file="patch-2.6.37.gz"
echo -n "Downloading $file:"
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"

Any ideas?

Source: http://fitnr.com/showing-file-download-progress-using-wget.html

stratis
  • 7,750
  • 13
  • 53
  • 94
  • 3
    You'll need to how us what you have tried in Python so that we'll be able to help you. – UltraInstinct Dec 05 '13 at 07:06
  • Basically nothing yet..! I am currently lost in the subprocess documentation..! The ideal thing to do here would be an insightful explanation of a proposed solution so that I can properly grasp the concept of the subprocess module and expand on it. – stratis Dec 05 '13 at 07:12
  • Allright, so far I did this:`wgetExecutable = '/usr/bin/wget' grepExecutable = '/usr/grep' wgetParameters = ['--progress=dot', "link_to_file"] grepParameters = ['--line-buffered', "%"] wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters, stdout=subprocess.PIPE)` – stratis Dec 05 '13 at 10:22
  • `grepPopen = subprocess.Popen([grepExecutable] + grepParameters, stdin=wgetPopen.stdout)` however I get an error in `stdin=wgetPopen.stdout` OSError: [Errno 2] No such file or directory – stratis Dec 05 '13 at 10:29
  • Note that there is also an `sh` module (with that name) that can take care of the bridge between bash and python! – PascalVKooten Dec 09 '13 at 07:50

5 Answers5

5

I think you're not far off. Mainly I'm wondering, why bother with running pipes into grep and sed and awk when you can do all that internally in Python?

#! /usr/bin/env python

import re
import subprocess

TARGET_FILE = "linux-2.6.0.tar.xz"
TARGET_LINK = "http://www.kernel.org/pub/linux/kernel/v2.6/%s" % TARGET_FILE

wgetExecutable = '/usr/bin/wget'
wgetParameters = ['--progress=dot', TARGET_LINK]

wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters,
                             stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

for line in iter(wgetPopen.stdout.readline, b''):
    match = re.search(r'\d+%', line)
    if match:
        print '\b\b\b\b' + match.group(0),

wgetPopen.stdout.close()
wgetPopen.wait()
dfarrell07
  • 2,872
  • 2
  • 21
  • 26
Tim Pierce
  • 5,514
  • 1
  • 15
  • 31
  • It does. Try on a smaller file. Or wait a little longer. :-) – Tim Pierce Dec 09 '13 at 06:46
  • Your code seems to update on some sort of intervals and in this file for example the first progress indication is only after 25%. However I need the progress to be instantaneous from the start just like the bash script..! – stratis Dec 09 '13 at 06:54
  • On my machine the behavior of this script is identical to the behavior of the bash script you posted. They both produce line-buffered output at the same rate. I'd be happy to adjust the script to do something different but I'm not able to reproduce the behavior you're talking about. I suspect that you're just seeing different response times for different files. – Tim Pierce Dec 09 '13 at 07:06
  • Ah: I get results closer to what you describe if I use `awk -W interactive` in the bash script. I'll poke at this some more later and see if I need to do something special to force line-buffered output in `subprocess`. – Tim Pierce Dec 09 '13 at 07:10
  • I think it's fixed now. [This Stack Overflow question](http://stackoverflow.com/questions/2804543/read-subprocess-stdout-line-by-line) gave me the clue I needed. Try it again? – Tim Pierce Dec 09 '13 at 07:22
  • @Konos5: here's [`subprocess`' idiomatic way to read output line by line as soon as it is available in Python 2](http://stackoverflow.com/a/17698359/4279). If `wget` uses block buffering while its stdout is redirected to a pipe then you might need [workarounds mentioned there](http://stackoverflow.com/a/17698359/4279). – jfs Dec 09 '13 at 17:40
  • @qwrrty: call `wgetPopen.stdout.close()` after the loop. Please, use `if not line:` instead of `if line != ''` – jfs Dec 09 '13 at 17:52
  • @J.F.Sebastian You're far too kind -- in the cold light of day, that `while True:` loop was just embarrassing. Corrected it with your excellent idiomatic Python suggestion, but not sure why `if not line:` is better; isn't `file.readline()` guaranteed to return an empty line when and only when EOF is reached? And won't `wgetPopen.stdout` be closed implicitly when the object is destroyed anyway? – Tim Pierce Dec 09 '13 at 18:09
  • 2
    +1. `wgetPopen.stdout` might be destroyed (I expect so, but I don't know). As well as with ordinary files, it is better to close them explicitly (`with`-statement is used for the files) without relying on garbage collection (that is complex and hard to reason about). `if not obj` says "if obj empty or zero" (the test for `None` should be written as `if obj is None`) without concerning with types e.g., in Python 3 `pipe.readline()` may return `b''` or `''` that are different types and `if not line` works for both. And It supports both Python 2/3 from the same source. – jfs Dec 09 '13 at 19:05
  • In defense of the `while` loop. It migth be easier to read by a Python novice that has previous experience with C-like languages. 2-argument `iter()` requires Python knowledge. – jfs Dec 09 '13 at 19:11
2

If you are rewriting the script in Python; you could replace wget by urllib.urlretrieve() in this case:

#!/usr/bin/env python
import os
import posixpath
import sys
import urllib
import urlparse

def url2filename(url):
    """Return basename corresponding to url.

    >>> url2filename('http://example.com/path/to/file?opt=1')
    'file'
    """
    urlpath = urlparse.urlsplit(url).path  # pylint: disable=E1103
    basename = posixpath.basename(urllib.unquote(urlpath))
    if os.path.basename(basename) != basename:
        raise ValueError  # refuse 'dir%5Cbasename.ext' on Windows
    return basename

def reporthook(blocknum, blocksize, totalsize):
    """Report download progress on stderr."""
    readsofar = blocknum * blocksize
    if totalsize > 0:
        percent = readsofar * 1e2 / totalsize
        s = "\r%5.1f%% %*d / %d" % (
            percent, len(str(totalsize)), readsofar, totalsize)
        sys.stderr.write(s)
        if readsofar >= totalsize: # near the end
            sys.stderr.write("\n")
    else: # total size is unknown
        sys.stderr.write("read %d\n" % (readsofar,))

url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else url2filename(url)
urllib.urlretrieve(url, filename, reporthook)

Example:

$ python download-file.py http://example.com/path/to/file 

It downloads the url to a file. If the file is not given then it uses basename from the url.

You could also run wget if you need it:

#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT

def urlretrieve(url, filename=None, width=4):
    destination = ["-O", filename] if filename is not None else []
    p = Popen(["wget"] + destination + ["--progress=dot", url],
              stdout=PIPE, stderr=STDOUT, bufsize=1) # line-buffered (out side)
    for line in iter(p.stdout.readline, b''):
        if b'%' in line: # grep "%"
            line = line.replace(b'.', b'') # sed -u -e "s,\.,,g"
            percents = line.split(None, 2)[1].decode() # awk $2
            sys.stderr.write("\b"*width + percents.rjust(width))
    p.communicate() # close stdout, wait for child's exit
    print("\b"*width + "DONE")

url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else None
urlretrieve(url, filename)

I have not noticed any buffering issues with this code.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
2

I've done something like this before. and i'd love to share my code with you:)

#!/usr/bin/python2.7
# encoding=utf-8

import sys
import os
import datetime

SHEBANG = "#!/bin/bash\n\n"

def get_cmd(editor='vim', initial_cmd=""):
    from subprocess import call
    from tempfile import NamedTemporaryFile
    # Create the initial temporary file.
    with NamedTemporaryFile(delete=False) as tf:
        tfName = tf.name
        tf.write(initial_cmd)
    # Fire up the editor.
    if call([editor, tfName], shell=False) != 0:
        return None
        # Editor died or was killed.
        # Get the modified content.
    fd = open(tfName)
    res = fd.read()
    fd.close()
    os.remove(tfName)
    return res

def main():
    initial_cmd = "wget " + sys.argv[1]
    cmd  = get_cmd(editor='vim', initial_cmd=initial_cmd)
    if len(sys.argv) > 1 and sys.argv[1] == 's':
        #keep the download infomation.
        t = datetime.datetime.now()
        filename = "swget_%02d%02d%02d%02d%02d" %\
                (t.month, t.day, t.hour, t.minute, t.second)
        with open(filename, 'w') as f:
            f.write(SHEBANG)
            f.write(cmd)
            f.close()
            os.chmod(filename, 0777)
    os.system(cmd)

main()


# run this script with the optional argument 's'
# copy the command to the editor, then save and quit. it will 
# begin to download. if you have use the argument 's'.
# then this script will create another executable script, you 
# can use that script to resume you interrupt download.( if server support)

so, basically, you just need to modify the initial_cmd's value, in your case, it's

wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
    sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'

this script will first create a temp file, then put shell commands in it, and give it execute permissions. and finally run the temp file with commands in it.

sunus
  • 838
  • 9
  • 11
  • i'd love to give you some feedback :) You could `call(filename)` instead of `os.system(cmd)`. To format datetime, you could use `.strftime()` method. `with`-statement closes files automatically that is the point of using it in the first place, no need to call `f.close()` by hand (unindent `chmod` in this case). If you want to make script executable by your user: `os.chmod(filename, os.stat(filename).st_mode | stat.S_IEXEC)` (or `| 0111` for `+x`). To avoid leaking files, move code inside `with Named..File() as tf:` call `tf.flush()` before `call([editor..)` then `tf.seek(0); res=tf.read()` – jfs Dec 10 '13 at 17:37
  • @J.F.Sebastian wow, thank you, man! it's a script i wrote long time ago. I was a bad python programmer back then:) thank you for pointing that out! – sunus Dec 11 '13 at 14:40
1

vim download.py

#!/usr/bin/env python

import subprocess
import os

sh_cmd = r"""
download()
{
    local url=$1
    echo -n "    "
    wget --progress=dot $url 2>&1 |
        grep --line-buffered "%"  |
        sed -u -e "s,\.,,g"       |
        awk '{printf("\b\b\b\b%4s", $2)}'

    echo -ne "\b\b\b\b"
    echo " DONE"
}
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"
"""

cmd = 'sh'
p = subprocess.Popen(cmd, 
    shell=True,
    stdin=subprocess.PIPE,
    env=os.environ
)
p.communicate(input=sh_cmd)

# or:
# p = subprocess.Popen(cmd,
#    shell=True,
#    stdin=subprocess.PIPE,
#    env={'file':'xx'})
# 
# p.communicate(input=sh_cmd)

# or:
# p = subprocess.Popen(cmd, shell=True,
#    stdin=subprocess.PIPE,
#    stdout=subprocess.PIPE,
#    stderr=subprocess.PIPE,
#    env=os.environ)
# stdout, stderr = p.communicate(input=sh_cmd)

then you can call like:

file="xxx" python dowload.py
Henk Langeveld
  • 8,088
  • 1
  • 43
  • 57
atupal
  • 16,404
  • 5
  • 31
  • 42
  • Why use `sh` as the command, **and** use `shell=True`? Why not run `sh_cmd` *directly*? – Martijn Pieters Dec 09 '13 at 08:23
  • @MartijnPieters Because the sh_cmd is not a "shell command", so we use `sh` to run it. In linux shell, we can use `sh script.sh` , and we can also use a PIPE or stdin to run some command, such as:`cat some_file | sh` or `curl http://xxx.xx | sh` and so on. For `shell=Ture`, From the docs, is says:The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence. – atupal Dec 09 '13 at 08:39
  • 1
    If you set `shell=True` a shell is used to run the command you pass in. You quoted the documentation yourself there. – Martijn Pieters Dec 09 '13 at 09:25
0

In very simple words, considering you have script.sh file, you can execute it and print its return value, if any:

import subprocess
process = subprocess.Popen('/path/to/script.sh', shell=True, stdout=subprocess.PIPE)
process.wait()
print process.returncode
securecurve
  • 5,589
  • 5
  • 45
  • 80
  • And ensure the `script.sh` has execute permission(`chmod +x script.sh`) or `Popen('sh /path/to/script.sh', shell=True ...)` – atupal Dec 09 '13 at 07:53
  • sure, it must have an execute permission +X, otherwise, it will give you an error, then, the above python code should work like a charm! – securecurve Dec 09 '13 at 08:03