0

I need to use the wget in a Python script with the subprocess.call function, but it seems the "wget" command cannot be identified by the bash subprocess opened by python.

I have added the environment variable (the path where wget is):

export PATH=/usr/local/bin:$PATH

to the ~/.bashrc file and the ~/.bash_profile file on my mac and guaranteed to have sourced them. And the python script looks like:

import subprocess as sp
cmd = 'wget'
process = sp.Popen(cmd ,stdout=sp.PIPE, stdin=sp.PIPE, 
stderr=sp.PIPE, shell=True ,executable='/bin/bash')
(stdoutdata, stderrdata) = process.communicate()
print stdoutdata, stderrdata

The expected output should be like

wget: missing URL
Usage: wget [OPTION]... [URL]...

But the result is always

/bin/bash: wget: command not found

Interestingly I can get the help output if I type in wget directly in a bash terminal, but it never works in the python script. How could it be?

PS:

If I change the command to

cmd = '/usr/local/bin/wget'

then it works. So I am sure I got wget installed.

AsouK
  • 3
  • 2
  • How are you running the Python script i.e. directly or via cron (or alike)? – heemayl Jan 10 '19 at 07:22
  • Can you run wget from your shell? this code worked for me. – Skam Jan 10 '19 at 07:23
  • 1
    The code works for me too. If using wget isn't a hard requirement I'd suggest you check out `requests`: http://docs.python-requests.org/en/master/ – orangeInk Jan 10 '19 at 07:30
  • Thank you for your advices. I hope to do some ML data selection with Python first, download them and then do some analysis on the files, so i hope to add the downloading part into the script rather than open a bash to do it. As for the requests, I need to download TB level data so the requests would probably be too slow to do that. – AsouK Jan 11 '19 at 03:11

1 Answers1

0

You can pass an env= argument to the subprocess functions.

import os

myenv = os.environ.copy
myenv['PATH'] = '/usr/local/bin:' + myenv['PATH']
subprocess.run(..., env=myenv)

However, you probably want to avoid running a shell at all, and instead augment the PATH that Python uses to find the binary to run in the subprocess call.

import subprocess as sp
import os

os.environ['PATH'] = '/usr/local/bin:' + os.environ['PATH']
cmd = 'wget'
# use run instead of Popen
# don't needlessly use a shell
# and thus put [cmd] as a list
process = sp.run([cmd], stdout=sp.PIPE, stdin=sp.PIPE, 
stderr=sp.PIPE, 
    universal_newlines=True)
print(process.stdout, process.stderr)

Running Bash commands in Python explains the changes I made in more detail.

However, there is no good reason to use an external utility for this; Python requests does pretty everything wget does, often more naturally and with more control over what exactly it does.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you, it does solve my question. It turns out that the IPython console does not share the environment variables with bash, and I do need to add it before calling a shell. As for the requests module, I need to download thousands of piciture files from a data base, and wget seems much faster than the requests module (which is based on Java after all) – AsouK Jan 11 '19 at 03:15
  • `requests` is not based on Java. This is the first time I hear about speed problems with it; certainly you could switch to something like an `async` library if latency is your problem. Here's a nice blog about that: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html – tripleee Jan 11 '19 at 04:38