I have a Flask app and want to use duckdb
as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple Python script outside of the Flask app, it can query the data and save it as a .parquet in under a second. When I bring that same methodology to the Flask app, it still successfully queries the data and returns it as a .parquet file but it takes roughly 45 seconds. Other endpoints that return a .parquet file -- ones that are pre-staged and do not need to be queried -- can do so in just a second or two. So the issue, apparently, is incorporating duckdb
inside my Flask application. Here is a sample boiler plate of what I have:
@test.route('/duckdb', methods = ['GET'])
def duckdb_test():
con = duckdb.connect(database = '~/flask_db/test.db')
# get tempfile .parquet
tmp = tempfile.NamedTemporaryFile(suffix = '.parquet', mode = 'w+b', delete = False)
# get data
df = con.sql("SELECT * FROM tbl WHERE name = 'John'").to_df()
# write to temporary .parquet
df.to_parquet(tmp.name, engine='pyarrow', index=False)
return send_file(tmp.name, mimetype='application/octet-stream', as_attachment=True, download_name="request.parquet")
I want to save it as a temp file. Not really sure what's wrong here. Again, it does work, but it just takes way, way too much time. The data being returned is about 12,000 rows in a ~5.5M row database -- but given that it works fairly quickly outside of the Flask app, on the same VM, the size itself shouldn't be an issue.