Reduce Memory Usage While Inserting Large Pandas Frame Into MongoDB

Question

I have a very large pandas df that I am attempting to insert into a MongoDB. The trouble is memory management. My code is below. I am using 'insert_many' to simply load the entire frame into the DB. This process is using a lot of memory. Is there a way to accomplish the same goal with less memory usage?

import pymongo
start = time()
client = pymongo.MongoClient()
db = client.test_db
collection = db.collection
collection.insert_many(data.to_dict('records')) 
end = time()
print ("Time to Populate DB:",end-start)

You can iterate over the DataFrame and only call `.to_dict('records')` on a sub-dataframe. There are many ways to achieve this. See this question: http://stackoverflow.com/questions/25699439/how-to-iterate-over-consecutive-chunks-of-pandas-dataframe-efficiently — Gustavo Bezerra, May 21 '17 at 08:10
However, if you are having memory issues, you should think about a strategy to create your DataFrame in chunks. You seem be loading the whole DataFrame into memory before doing the MongoDB insert. For example, `pd.read_csv` has a `chunksize` option. The `pd.DataFrame.memory_usage` method is also useful. — Gustavo Bezerra, May 21 '17 at 08:14
@GustavoBezerra - Both of those comments are very helpful. I will test out. — Jeff Saltfist, May 21 '17 at 08:40

Reduce Memory Usage While Inserting Large Pandas Frame Into MongoDB

0 Answers0