I am new to developing Rest APIs and trying to deploy a machine learning model for image segmentation using Python and Rest APIs.
On the server side I am using FastAPI while on the client side I use the Python requests library. The client already resizes the image to the necessary input size of the model and therefore doesn't send unneccessary large images. The server feeds the received image to the model and returns the binary segmentation mask. The image and the mask are converted from numpy arrays to lists which are then send as json data.
Below is some code, representing what I've just described. As I cannot provide the model here the server in this minimum reproducible example is just going to return the same image it received.
server.py
import uvicorn
from fastapi import FastAPI
import numpy as np
from datetime import datetime
app = FastAPI()
@app.get('/test')
def predict_and_process(data: dict = None):
start = datetime.now()
if data:
image = np.asarray(data['image'])
print("Time to run: ", datetime.now() - start)
return {'prediction': np.squeeze(image).tolist()}
else:
return {'msg': "Model or data not available"}
def run():
PORT = 27010
uvicorn.run(
app,
host="127.0.0.1",
port=PORT,
)
if __name__=='__main__':
run()
client.py
import requests
import numpy as np
import json
from matplotlib.pyplot import imread
from skimage.transform import resize
from datetime import datetime
def test_speed():
path_to_img = r"path_to_some_image"
image = imread(path_to_img)
image = resize(image, (1024, 1024))
img_list = image.tolist()
data = {'image': img_list}
start = datetime.now()
respond = requests.get('http://127.0.0.1:27010/test', json=data)
prediction = respond.json()['prediction']
print("time for prediction: {}".format(datetime.now()-start))
if __name__=='__main__':
test_speed()
The output from the server is:
(cera) PS C:\Users\user_name\Desktop\MRM\REST> python .\server.py
[32mINFO[0m: Started server process [[36m20448[0m]
[32mINFO[0m: Waiting for application startup.
[32mINFO[0m: Application startup complete.
[32mINFO[0m: Uvicorn running on [1mhttp://127.0.0.1:27010[0m (Press CTRL+C to quit)
Time to run: 0:00:00.337099
[32mINFO[0m: 127.0.0.1:61631 - "[1mGET /test HTTP/1.1[0m" [32m200 OK[0m
and the output from the client is:
(cera) PS C:\Users\user_name\Desktop\MRM\REST> python .\client.py
time for prediction: 0:00:16.845123
Since the code running on the server is less than a second, the time needed to transfer the image from the client to the server (or back) is somewhere around 8 seconds, which is definitely too long.
I can't send smaller images since the input size of the model needs to stay the same.
So for a deployment/REST newbie: what would be a professional / best-practice way to get my predictions from a REST API faster? I assume there are limits since I'm using python but 16 seconds still seems way too long to me.
Thank you in advance!