I have an image ML service running behind Flask. From most of what I've seen online, one first encodes an image in base64 on the client, and then sends the payload to the service, which decodes the image and proceeds to do its predictions. As an example:
Client-Side
with open("my_image.jpeg", 'rb') as f:
image_b64 = base64.b64encode(f.read()).decode('utf-8')
request.post("https://my-service", json = {'image_b64' : image_b64})
Server-Side
image_b64 = request_json.get('image_b64')
image = BytesIO(base64.b64decode(image_b64))
...
This approach seems to suffer from a number of serious overhead issues which crop up when the model server is receiving many requests simultaneously:
- base64 encodings tend to inflate file size by 30%.
- The encoding/decoding process is slow for both client and service, which bottlenecks the rate at which the model can actually do predictions. If it's relevant here, I'm primarily interested in CPU-only serving, so no GPU involved.
What I'm wondering about is: is there a better way?. I'm guessing there are probably some smart ways of threading the decoding/encoding process, but it still seems woefully inefficient.