Problem
I have a Python application inside Docker container. The application receives "jobs" from some queue service (RabbitMQ
), does some computing tasks and uploads results into database (MySQL
and Redis
).
The issue I face is - the RAM is not properly "cleaned up" between iterations and thus memory consumption between iterations raises until OOM. Since I have implemented MemoryError
(see tested solutions below for more info), the container stays alive and the memory keeps exhausted (not freed up by container restart).
Question
- How to debug what is "staying" in the memory so I can clean it up?
- How to cleanup the memory properly between runs?
Iteration description
An example of increasing memory utilisation; memory limit set to 3000 MiB
- fresh container:
130 MiB
- 1st iteration:
1000 MiB
- 2nd iteration:
1500 MiB
- 3rd iteration:
1750 MiB
- 4th iteration:
OOM
Note: Every run/iteration is a bit different and thus has a bit different memory requirements, but the pattern stays similar.
Below is a brief overiew of the iteraion which might be helpful while determining what might be wrong
- Receiving job parameters from
rabbitmq
- Loading data from local parquet into dataframe (using
read_parquet(filename, engine="fastparquet")
) - Computing values using
Pandas
functions and other libraries (most of the laod is probably here) - Converting dataframe to dictionary and computing some other values inside a loop
- Adding some more metrics from computed values - e.g. highest/lowest values, trends etc.
- Storing metrics from 5. in database (
MySQL
andRedis
)
A selection of the tech I use
- Python
3.10
- Pandas
1.4.4
- numpy
1.24.2
- running in
AWS ECS Fargate
(but results on local are similar);1 vCPU
and8 GB
or memory
Possible solutions / tried approaches
- ❌: tried; not worked
- : and idea I am going to test
- : did not completely solved the problem, but helped towards the solution
- ✅: working solution
❌ Restart container after every iteration
The most obvious one is to restart the docker container (e.g. using exit()
and causing container to restart itself) after every iteration. This solution is not feasible, because the size of "restart overhead" is too big (one run takes 15 - 60 seconds and thus the restart will slow things soo much).
❌ Using gc.collect()
I have tried to call gc.collect()
at the very beginning of each iteration, but the memory usage did not change at all.
✅ Test multiprocessing
I read some recommendations to use multiprocessing
module in order to improve memory efficiency, because it will "drop" all resources after subprocess finishes.
This solved the issue, see answers below.
https://stackoverflow.com/a/1316799/12193952
Use explicit del
on unwanted objects
The idea is to explicitly delete objects that are not longer used (e.g. dataframe
after it's converted to dictionary
).
del my_array
del my_object
https://stackoverflow.com/a/1316793/12193952
Monitor memory using psutil
import psutil
# Local imports
from utils import logger
def get_usage():
total = round(psutil.virtual_memory().total / 1000 / 1000, 4)
used = round(psutil.virtual_memory().used / 1000 / 1000, 4)
pct = round(used / total * 100, 1)
logger.info(f"Current memory usage is: {used} / {total} MB ({pct} %)")
return True
Support except MemoryError
Thanks to this question I was able to set up try
/except
pattern that catches OOM
errors and keep the container running (so logs are available etc.).
Even if I don't get any answer, I will continue testing and editing until I find a solution and hopefully help someone else.