0

I have a Python script that uses urllib2 to retrieve data from an external site. I am on a corporate network that requires proxy authentication.

While on command line, I am able to export the proxy settings in the .bashrc to allow the script to exit via the proxy and make the request.

So the script does work from behind the proxy.

Here is the problem: I need to call this Python script from a php script on a website. I have tried several ways to achieve this by calling the script with: exec(), popen(), shell_exec()

I can't get the script to return any results. When tailing /var/log/httpd/error** I can see the error being generated:

urllib2.URLError: <urlopen error [Errno 110] Connection timed out>, referer:

This is the same error that I received before setting the proxy in the .bashrc

I have suPHP set up and configured to run scripts as a particular user. I have also set all files including the python script to be owned by this user, and also adjusted permissions, trying +x and also insecurely setting to 777 just for testing purposes.

I can run a a php script from the same directory from the website, and verify Apache is running under this user with a simple:

echo exec('whoami');

I can also execute a simple Python script from this same PHP page with the same setup that only prints to stdout and I can return that value back to the webpage, so I know I can execute Python scripts with this method.

When in command line, I su to the same user that has been established as the user that Apache runs under and set the proxy in that account, but still, the script does not execute correctly when executing from the web page, still only works in CLI.

Just to test, I added a line to write to a file in the Python script with the intent to just write the data to that file that I needed returned, thinking that I could just read that file in afterwards. What I noticed is that, the creation of the files works, but no data is written to it since the urllib2 code times out and never writes to the file.

Any idea how to make my PHP script execute this Python script that needs Proxy access?

Do I need to explicitly tell urllib2 to use a proxy? The urllib2 routine that I am using is part of a Python module that is coded to just use the OS's proxy settings, and again, I know it works, since I can execute this under the Apache user from CLI.

Any help is greatly appreciated.

luskbo
  • 155
  • 3
  • 18

2 Answers2

1

To inform urllib2 to use a proxy you might use a ProxyHandler:

proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')

It is surprising you have to do that explicitly since the doc of urlopen says:

In addition, if proxy settings are detected (for example, when a *_proxy environment variable like http_proxy is set), ProxyHandler is default installed and makes sure the requests are handled through the proxy.

Is you http_proxy environment variable properly set in the environment the script runs?


You will find more info on how to use a proxy with urllib2 in a previous question: Proxy with urllib2

Community
  • 1
  • 1
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
  • Yes. Lets say my apache user is named "bob". I have su'ed into bob, and set the http_proxy for "bob" it that accounts .bashrc. I can also execute the script whn logged into "bob". urllib2 does detect this proxy setting and executes, but it only works this way when executing through command line. – luskbo Jun 14 '13 at 19:05
  • @luskbo Not sure to understand. But ... usually CGI scripts run with a very limited set of environment variable. Maybe the http_proxy is not set in that context ? – Sylvain Leroux Jun 14 '13 at 19:08
  • @luskbo With Apache, there is a `SetEnv` directive that control the environment variables of your CGI. Take a look a it http://httpd.apache.org/docs/2.2/env.html – Sylvain Leroux Jun 14 '13 at 19:10
  • First off, what finally fixed it was explicitly setting the proxy value. Not sure why urllib2 does not take the OS set value like its supposed to do, but something about executing it from php breaks this. My SetEvn has the path to both php and python in it, so i'm not sure how that is incorrect. SetEnv PATH /usr/bin:/usr/local/bin:/bin:/usr/bin/python:/usr/bin/php But explicitly setting the proxy does work in this case, so thanks!!! – luskbo Jun 14 '13 at 19:42
0

You could try passing explicit proxy settings to your Python script to see if that clears up the problem for you. I recently wrote a script that allows you to set proxy settings with command line arguments that might be useful for this case. The important parts of the script are below:

# Import the required libraries
from urllib import urlencode
from urllib2 import Request, urlopen, URLError, ProxyHandler, build_opener, install_opener
import argparse

# Set up our argument parser
parser = argparse.ArgumentParser(description='Does stuff through a proxy')
parser.add_argument('webAddr', type=str, help='Web address of target server')
parser.add_argument('--proxServ', metavar='SERV', type=str, help='Web address of proxy server, i.e. http://proxy.server.com:80')
parser.add_argument('--proxType', metavar='TYPE', type=str, default='http', help='Type of proxy server, i.e. http')

# Get the arguments from the parser
args = parser.parse_args()

# Define data to pass to server (could generate this from arguments as well)
values = {'name': 'data'}   # generate data to pass to server

# Define proxy settings if proxy server is input.
if args.proxServ:       # set up the proxy server support
    proxySupport = ProxyHandler({args.proxType: args.proxServ})
    opener = build_opener(proxySupport)
    install_opener(opener)

# Set up the data object
data = urlencode(values)
data = data.encode('utf-8')

# Send request to the server and receive response, with error handling!
try:
    req = Request(args.webAddr, data)

    # Read the response and print to a file
    response = urlopen(req)
    print response.read()

except URLError, e:
    if hasattr(e, 'reason'):    # URL error case
        # a tuple containing error code and text error message
        print 'Error: Failed to reach a server.'
        print 'Reason: ', e.reason
    elif hasattr(e, 'code'):    # HTTP error case
        # HTTP error code, see section 10 of RFC 2616 for details
        print 'Error: The server could not fulfill the request.'
        print 'Error code: ', e.code

urllib2 is supposed to use your system settings for any proxy handling, but I guess there are times where this might not work the way that you want it to. Defining the settings explicitly probably can't hurt. You could check out this document for more information too.

Engineero
  • 12,340
  • 5
  • 53
  • 75