I have a Flask API witch the endpoints calls a function like:
@app.route('/Ads', methods=['POST'])
def ads():
data = request.get_json()
return execute_action(etl_ads, data)
def execute_action(action, *args):
try:
logging.info('Starting {0}: {1}'.format(str(action.__name__), __try_parse_str(*args)))
with futures.ProcessPoolExecutor(max_workers=5) as executor:
result = executor.submit(action, *args).result()
logging.info('Finished {0}'.format(str(action.__name__)))
return result
except Exception as e:
logging.error('Error on {0}: {1}'.format(str(action.__name__), str(e)))
return json.dumps({'error': '{0}'.format(str(e))}), 500, {'ContentType': 'application/json'}
The body of this request is:
{
"file_urls": ["blob_url"],
"meta_data": {
"job_creation_time": "2022-11-08T09:36:00"
},
"company_tenant": {
"id": "tenant_id"
}
}
The etl_ads
is a function that downloads the content of the urls (each URL is a JSON file), transforms the data and save it on BigQuery. The process is the same to all endpoints. The difference is how the data is transformed, that's why the funcion as a parameter to execute_action
.
When I tried to use GUnicorn or Waitress to host the app instead of using the Flask development server the code got slower (I'm using K6 to execute performance tests).
When I remove the futures
the problem stops: the code starts to run faster with GUnicorn and Waitress. Why is this happening? What's de difference between Flask development server and those WSGIs that they are slower when using future
?
I also tried to use gevent directly (based on this answer), but the result was the same.