I have a flask app running in a container on EC2. On starting the container, the docker stats gave memory usage close to 48MB. After making the first API call (reading a 2gb file from s3), the usage rises to 5.72GB. Even after completion of the api call, the usage does not go down.
On hitting the request, the usage goes up by around twice the file size and after a few requests, the server starts giving the memory error
Also, on running the same Flask app without the container, we do not see any such increment in memory utilized.
Output of "docker stats <container_id>" before hitting the API-
Output of "docker stats <container_id>" after hitting the API
Flask app (app.py) contains-
import os
import json
import pandas as pd
import flask
app = flask.Flask(__name__)
@app.route('/uploadData', methods=['POST'])
def test():
json_input = flask.request.args.to_dict()
s3_path = json_input['s3_path']
# reading file directly from s3 - without downloading
df = pd.read_csv(s3_path)
print(df.head(5))
#clearing df
df = None
return json_input
@app.route('/healthcheck', methods=['GET'])
def HealthCheck():
return "Success"
if __name__ == '__main__':
app.run(host="0.0.0.0", port='8898')
Docker contains-
FROM python:3.7.10
RUN apt-get update -y && apt-get install -y python-dev
# We copy just the requirements.txt first to leverage Docker cache
COPY . /app_abhi
WORKDIR /app_abhi
EXPOSE 8898
RUN pip3 install flask boto3 pandas fsspec s3fs
CMD [ "python","-u", "app.py" ]
I tried reading the file directly from S3 as well as downloading the file and then reading it but it did not work.
Any leads in getting this memory utilization down to the initial consumption would be a great help!