0

I am using below command to fetch the data from mongodb collection and convert to dataframe. The time taken to fetch 30000 records is >15 seconds. How can I convert it to dataframe without taking more time?

db = conn.db
collection = db.collection_name
pd.DataFrame(list(collection.find()))
hyper-neutrino
  • 5,272
  • 2
  • 29
  • 50
behappy
  • 35
  • 1
  • 7

1 Answers1

0

This question may be a duplicate of this question.

I made a short performance test from the presented options in the linked question above (with example data from my MongoDB database):

In [1]: %timeit df = pd.DataFrame(list(collection.find()))
1.11 s ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit df = pd.DataFrame.from_records(collection.find())
1.11 s ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit df = pd.json_normalize(collection.find())
4.89 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Disclaimer

  • This is only reprensentative for the ratio between the execution times and cannot be applied to your exact question as I don't have access to your data.
  • json_normalize() returns other dataframe (unnested) columns than the others.

If you don't need all fields from the MongoDB documents, I recommend to only search for the fields you really need. See MongoDB documetation.

fbardos
  • 480
  • 1
  • 6
  • 15