There are a couple of things that you can do to have less memory consumption in your Django project. The link that you shared on releasing memory highlights a few of them. However, since you're also dealing with a web framework, there are a couple of other things that you can do.
Read data in smaller chunks: When you read data, it is immediately loaded into the memory for processing. Avoid loading everything at once.
Write data in smaller chunks: Similar to reading data in chunks, write data in chunks rather than all at once. This would reduce bulk-memory consumption while writing large data frames to disk.
Ensure correct data types: Don't rely on default data types that Pandas assign to your data. For example, your data might easily fit into int32
or float32
and pandas might have int64
/float64
assigned to them. Be explicit where possible (Remember Zen of Python)
See for code-level optimizations: specifically avoid making a lot of copies of your dataframe. Do in-place transformations rather than re-creating a dataframe for each operation.
Use generators instead of lists as it does not load all data in memory, but yield
it once necessary.
Avoid doing manipulation in request/response cycle: Finally, avoid doing pandas manipulation in the request/response cycle of Django requests.
Use queues: Shift the data manipulation to task queues such as celery. (You can also separate these queues on managed services from the cloud), or just another EC2 instance, for example. You can use distributed queues to further scale your system and lower memory consumption.
Querysets: Use Django querysets efficiently. They are lazily loaded. Ensure you can filter
(read chunk) them depending on your needs.
Dataframes up for garbage collection: Make sure you use del
when you're done with a dataframe. This would remove the reference to the dataframe. Then you can run gc.collect()
function to trigger the garbage collector explicitly to free-up memory.
You may also use gc.set_threshold
function to further tune the garbage collector.
A little example of this might look like this:
import gc
import pandas as pd
def service_layer_function():
# generate dataframe
# get done with it.
# explicitly release reference to df
del df
# trigger garbage collector
gc.collect()