ways to reduce the time taken to convert db records to dataframe

Question

I am using below command to fetch the data from mongodb collection and convert to dataframe. The time taken to fetch 30000 records is >15 seconds. How can I convert it to dataframe without taking more time?

db = conn.db
collection = db.collection_name
pd.DataFrame(list(collection.find()))

score 0 · Answer 1 · answered Jul 09 '21 at 12:46

This question may be a duplicate of this question.

I made a short performance test from the presented options in the linked question above (with example data from my MongoDB database):

In [1]: %timeit df = pd.DataFrame(list(collection.find()))
1.11 s ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit df = pd.DataFrame.from_records(collection.find())
1.11 s ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit df = pd.json_normalize(collection.find())
4.89 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Disclaimer

This is only reprensentative for the ratio between the execution times and cannot be applied to your exact question as I don't have access to your data.
json_normalize() returns other dataframe (unnested) columns than the others.

If you don't need all fields from the MongoDB documents, I recommend to only search for the fields you really need. See MongoDB documetation.

ways to reduce the time taken to convert db records to dataframe

1 Answers1