How to add environment variables to the bash opened by subprocess module?

Question

I need to use the wget in a Python script with the subprocess.call function, but it seems the "wget" command cannot be identified by the bash subprocess opened by python.

I have added the environment variable (the path where wget is):

export PATH=/usr/local/bin:$PATH

to the ~/.bashrc file and the ~/.bash_profile file on my mac and guaranteed to have sourced them. And the python script looks like:

import subprocess as sp
cmd = 'wget'
process = sp.Popen(cmd ,stdout=sp.PIPE, stdin=sp.PIPE, 
stderr=sp.PIPE, shell=True ,executable='/bin/bash')
(stdoutdata, stderrdata) = process.communicate()
print stdoutdata, stderrdata

The expected output should be like

wget: missing URL
Usage: wget [OPTION]... [URL]...

But the result is always

/bin/bash: wget: command not found

Interestingly I can get the help output if I type in wget directly in a bash terminal, but it never works in the python script. How could it be?

PS:

If I change the command to

cmd = '/usr/local/bin/wget'

then it works. So I am sure I got wget installed.

How are you running the Python script i.e. directly or via cron (or alike)? — heemayl, Jan 10 '19 at 07:22
The code works for me too. If using wget isn't a hard requirement I'd suggest you check out `requests`: http://docs.python-requests.org/en/master/ — orangeInk, Jan 10 '19 at 07:30
Thank you for your advices. I hope to do some ML data selection with Python first, download them and then do some analysis on the files, so i hope to add the downloading part into the script rather than open a bash to do it. As for the requests, I need to download TB level data so the requests would probably be too slow to do that. — AsouK, Jan 11 '19 at 03:11

score 0 · Accepted Answer · answered Jan 10 '19 at 07:37

You can pass an env= argument to the subprocess functions.

import os

myenv = os.environ.copy
myenv['PATH'] = '/usr/local/bin:' + myenv['PATH']
subprocess.run(..., env=myenv)

However, you probably want to avoid running a shell at all, and instead augment the PATH that Python uses to find the binary to run in the subprocess call.

import subprocess as sp
import os

os.environ['PATH'] = '/usr/local/bin:' + os.environ['PATH']
cmd = 'wget'
# use run instead of Popen
# don't needlessly use a shell
# and thus put [cmd] as a list
process = sp.run([cmd], stdout=sp.PIPE, stdin=sp.PIPE, 
stderr=sp.PIPE, 
    universal_newlines=True)
print(process.stdout, process.stderr)

Running Bash commands in Python explains the changes I made in more detail.

However, there is no good reason to use an external utility for this; Python requests does pretty everything wget does, often more naturally and with more control over what exactly it does.

Thank you, it does solve my question. It turns out that the IPython console does not share the environment variables with bash, and I do need to add it before calling a shell. As for the requests module, I need to download thousands of piciture files from a data base, and wget seems much faster than the requests module (which is based on Java after all) — AsouK, Jan 11 '19 at 03:15
`requests` is not based on Java. This is the first time I hear about speed problems with it; certainly you could switch to something like an `async` library if latency is your problem. Here's a nice blog about that: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html — tripleee, Jan 11 '19 at 04:38

How to add environment variables to the bash opened by subprocess module?

1 Answers1