6

I've got a Python script which downloads data in json format through HTTP. If I run the script through command-line using the requests module, the HTTP connection is successful and data is downloaded without any issues. But when I try to launch the script as a crontab job, the HTTP connection throws a timeout after a while. Could anyone please tell me what is going on here? I am currently downloading data via a bash script first and then running the Python script from within that bash. But this is nonsense! Thank you so much!

Using: 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

P.S.: I haven't found any posts regarding this issue. If there is already an answer for this on some other post, then please accept my apologies.

This is an excerpt from my code. It times out when running requests.get(url):

try:
   response = requests.get(url)
   messages = response.json()["Messages"]
except requests.exceptions.Timeout:
   logging.critical("TIMEOUT received when connecting to HTTP server.")
except requests.exceptions.ConnectionError:
   logging.critical("CONNECTION ERROR received when connecting to HTTP server.")
Erik Campo
  • 81
  • 5
  • Could you specify URL and/or how it's constructed? Log it in both cases and ensure it lines up? The first thing I'd check is whether url is what you think it is, especially if it's constructed e.g. from environment variables. – lyngvi Nov 03 '17 at 17:51
  • Thanks for your answer! I am unsure I can specify the actual URL; I'll have to check that. Anyway, one thing I can say is that the script works flawlessly through command-line whereas it just hangs up if launched through crontab. I am pretty positive that the URL is properly constructed (printed the result and then copied+pasted that string into Python interpreter to test). Anyway, you just gave me an idea: test with some other URL. – Erik Campo Nov 03 '17 at 17:56
  • All the more reason to ask how it's constructed. As an example, the environment under which Cron executes doesn't generally line up with the environment under which you normally executes (e.g. variables from `.bashrc`/`.bash_profile` are not generally available.) – lyngvi Nov 03 '17 at 18:00
  • This is how the URL looks like (I removed sensitive info.). It's just constructed as a string: url="http://domain/dir1/service1/service/gest_json.json/?access_id=1111&password=SOME_PWD&start_utc=2017-11-03 07:00:00&end_utc=2017-11-03 17:00:00&mobile_id=1111" – Erik Campo Nov 03 '17 at 18:05
  • And just to confirm - if you log the value of 'url' when running it via 'cron', you get exactly the same URL? Re: Construction: Looking more for things like `url = os.environ['HOST_ROOT'] + '/dir1/service1/'` – lyngvi Nov 03 '17 at 18:07
  • URL is the same when launched through crontab. Thanks. – Erik Campo Nov 03 '17 at 18:16

1 Answers1

2

I just found the answer to my question. I've defined the proxy being used and then used it like this in my code:

HTTP_PROXY="http://your_proxy:proxy_port"
PROXY_DICT={"http":HTTP_PROXY}

response = requests.get(url, proxies=PROXY_DICT)

Reference:

Proxies with Python 'Requests' module

Thank you all for your comprehension. I guess I should have done a thorough search before posting. Sorry.

Erik Campo
  • 81
  • 5