0

I'm writing a test where a simple mock S3 is loaded up in the test environment using http.server.HTTPServer/http.server.BaseHTTPRequestHandler, to test multipart download behaviour involving Boto's S3Transfer.

It works fine, unless I specify that the server uses HTTP/1.1. In this case, it would download 2 8mb parts of a 100mb file, and then just hang. I would like for the mock server to use HTTP/1.1 since that's what the real S3 uses (I believe).

A simplified version of the test is below, that can be run by...

pip3 install boto3
python3 test.py    

# test.py

import http.server
import re
import threading

import boto3
from botocore import (
    UNSIGNED,
)
from botocore.client import (
    Config,
)

length = 100 * 2**20

class MockS3(http.server.BaseHTTPRequestHandler):
    # If the below line is commented, the download completes
    protocol_version = 'HTTP/1.1'

    def do_GET(self):
        range_header = self.headers['Range']
        match = re.search(r'^bytes=(\d+)-(\d*)', range_header)
        start_inclusive_str, end_inclusive_str = match.group(1), match.group(2)
        start = int(start_inclusive_str)
        end = int(end_inclusive_str) + 1 if end_inclusive_str else length
        bytes_to_send = end - start

        self.send_response(206)
        self.send_header('Content-Length', str(bytes_to_send))
        self.end_headers()
        self.wfile.write(bytearray(bytes_to_send))

    def do_HEAD(self):
        self.send_response(200)
        self.send_header('Content-Length', length)
        self.end_headers()

server_address = ('localhost', 5678)
server = http.server.HTTPServer(server_address, MockS3)
thread = threading.Thread(target=server.serve_forever)
thread.daemon = True
thread.start()

class Writable():
    def write(self, data):
        pass

s3_client = boto3.client('s3',
  endpoint_url='http://localhost:5678',
  config=Config(signature_version=UNSIGNED),
)

s3_client.download_fileobj(
  Bucket='some',
  Key='key',
  Fileobj=Writable(),
)

Note that Writable is deliberately not seekable: in my real code, I'm using a non-seekable file-like object.

Yes, moto can be used to to make a mock S3, and I do so for other tests, but for this particular test I would like "real" server. There are custom file objects involved, and want to ensure that S3Transfer, and other code that isn't relevant to this question, behaves together as I expect.

How can I setup a mock S3 server that uses HTTP/1.1 and that S3Transfer can download from?

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • Super interesting problem. I doubt it has anything to do with it, but can you add a Content-type header? – erip Mar 17 '18 at 20:09
  • Ah, I suspect that the web server isn't thread-safe and a thread is blocking it... but I'm not sure why it would work with `HTTP/1.0`. See [this](https://stackoverflow.com/a/17437997/2883245) potentially relevant answer. – erip Mar 17 '18 at 20:14
  • @erip Tried with a content-type of `application/octet-stream` and the same thing happens. Regarding not being thread safe, I do load up the web server in another thread, but it would be quite a strange for a web-server to not be thread safe in that respect, i.e. to not handle multiple concurrent requests? I was wondering if it's something to do with HTTP/1.1's persistent connections, but not quite sure how. – Michal Charemza Mar 17 '18 at 20:32
  • Good question, but I think the `HTTPServer` will handle requests synchronously until a mixin is added. I haven't tested this myself, but looking at a few questions (like the one posted before and [this one](https://stackoverflow.com/questions/14088294/multithreaded-web-server-in-python) suggest that to be true. – erip Mar 17 '18 at 20:37
  • @erip Ah yes! Yes, I realise now the server being in a separate thread doesn't necessary mean it handles concurrent requests safely. I have added the `ThreadingMixIn` and now it works. Happy if you want to post an answer with this. – Michal Charemza Mar 17 '18 at 20:43

1 Answers1

1

There is a bug in your threading logic. What you're currently doing is serving on a separate thread, but what you really want to do is concurrently handle requests on multiple threads.

This can be achieved by creating a very dumb HTTP server which just mixes in a threading capabilities:

class ThreadingServer(ThreadingMixIn, HTTPServer):
    pass

and serving from this server instead of the base HTTPServer.

As for why this works with HTTP/1.0, the connection was closed after a single request was serviced.

erip
  • 16,374
  • 11
  • 66
  • 121