This question may be a duplicate of this question.
I made a short performance test from the presented options in the linked question above (with example data from my MongoDB database):
In [1]: %timeit df = pd.DataFrame(list(collection.find()))
1.11 s ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit df = pd.DataFrame.from_records(collection.find())
1.11 s ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit df = pd.json_normalize(collection.find())
4.89 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Disclaimer
- This is only reprensentative for the ratio between the execution times and cannot be applied to your exact question as I don't have access to your data.
json_normalize()
returns other dataframe (unnested) columns than the others.
If you don't need all fields from the MongoDB documents, I recommend to only search for the fields you really need. See MongoDB documetation.