0

I have a website that has two servers - one is dedicated to client-facing web services, and the other is a beefier data processing server.

I currently have a process in which the web server contacts the data server for multiple requests that typically look like this:

payload = {'req_type':'data_processing', 'sub_type':'data_crunch', 'id_num':12345} 
r = requests.get('https://data.mywebsite.com/_api_route', params = payload)

...which has been running like clockwork for the better part of the past year. However, after creating a pandas-heavy function on the data server, I've been getting the following error (which I can't imagine has anything to do with pandas, but thought I'd throw it out there anyway):

HTTPSConnectionPool(host='data.mywebsite.com', port=443): 
    Max retries exceeded with url: /_api_route?...... 
    (Caused by <class 'httplib.BadStatusLine'>: '')

Both servers are running ubuntu, with python, and the Requests library to handle communication between the servers.

There is a similar question here: Max retries exceeded with URL, but the OP is asking about contacting a server over which he has no control - I can code both sides, so I'm hoping I can change something on my data server, but am not sure what it would be.

Community
  • 1
  • 1
elPastor
  • 8,435
  • 11
  • 53
  • 81
  • 1
    Please update your question with the return statement(s) of the view on your data processing server. Unless there is an error with your code or you explicitly set the status of the response you return from that view, it should be 200. If had to guess, there's some error in your view, but that should be very apparent in your logs. If there is no error there, I would try to send the data to the deep processing server with [`curl -v`](http://stackoverflow.com/a/7173011/5854907) so you can see exactly what the response is. – Allie Fitter Mar 27 '17 at 23:42
  • @AllieFitter - thanks for the idea. We ran a `subprocess` in order to mimic the curl request and that was helpful in the troubleshooting process. However, the real issue seems to be the number of concurrent requests going from the web server to the data server. A single request was fine, but multiple concurrent requests triggered the issue. I've since added a 0.5s delay in the jQuery script and everything seems to be working fine. – elPastor Mar 28 '17 at 18:13

3 Answers3

1

Changing the number if retries will not solve your problem. Caused by <class 'httplib.BadStatusLine'>: '' is what you should fix. The server returned an empty HTTP status code, instead of something like "200" or "500".

Stephane Martin
  • 1,612
  • 1
  • 17
  • 25
  • Hi Stephane - thanks for the quick response, but I'm not sure how to fix that line. Any thoughts? – elPastor Mar 27 '17 at 21:53
  • Probably a bug in the data server code. Not much more we can say with only the provided piece of information. – Stephane Martin Mar 27 '17 at 21:54
  • by the way with the requests library the HTTP status code is available with r.status_code, not r.status. That's why you had an AttributeError... – Stephane Martin Mar 28 '17 at 18:49
  • While my above hack works frequently, it doesn't work all the time. And from what I can tell, it depends on which way the wind is blowing. I'll run it once with 25 simultaneous calls and they'll all work, run it again, and some will fail. I did use `status_code` and for those that work, I get a 200 response, for those that don't, I get the error in my original post. – elPastor Mar 28 '17 at 22:21
1

The solution would be, if you're not already, to use a container such as uWSGI or gunicorn to handle concurrency and, once again if you're not already, use Nginx or Apache to host the server. I've used uWSGI a fair amount and it's configuration is very straight forward. To create more processes to handle requests, you just need to set processes = 2 in your .ini file. You can also use Nginx or Apache to spawn processes, but uWSGI is built specifically for python and works wonderfully with Flask. I would advise you implement this is your haven't already, and then observe memory and processor usage as you increment the number processes until you find a good number that your server can handle.

EDIT: Just as a P.S. I run a Flask app on an Nginx server using uWSGI with fairly bare bones hardware (just 2.5Ghz dual core) and with 16 processes, I average about 40% cpu usage.

Allie Fitter
  • 1,689
  • 14
  • 18
  • Thanks Allie - I think you're absolutely right, this requires a server-level fix. Unfortunately I'm clueless on how to update uWSGI, etc., but I will look into it. And yes, I am already using Apache and Flask. – elPastor Mar 29 '17 at 18:52
  • I would suggest switching to `Nginx` as it has `uWSGI` support built in. [This tutorial](https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-uwsgi-and-nginx-on-ubuntu-16-04) helped me when I was first starting with uWSGI – Allie Fitter Mar 29 '17 at 18:58
0

I've scoured the known internet for solutions to this problem, and I don't think I'm going to find one in the near future.

Instead, I built in an internal retry loop (in python) on a 1-second delay that looks something like this:

counter   = 0
max_tries = 10

while counter < max_tries:
    try:
        r       = requests.get('https://data.mywebsite.com/_api_route', params = payload)
        counter = max_tries
        code    = r.json()['r']['code']
        res     = r.json()['r']['response']

        return code, res

    except requests.exceptions.ConnectionError, e:
        counter += 1
        time.sleep(1)

It's definitely not a solution as much as it is a workaround, but for now, it does just that, it works... assuming it doesn't have to retry more than 10 times.

elPastor
  • 8,435
  • 11
  • 53
  • 81