In an other question some people are trying to insert a Pandas DataFrame into MongoDB using Python internal structures (dict
, list
)
Insert a Pandas Dataframe into mongodb using PyMongo
I wonder if we can't insert instead a NumPy rec.array
(numpy.recarray
) to MongoDB using PyMongo.
That should probably be more efficient because pandas.DataFrame.to_dict
use for loops and that very long to process huge volume of data
In [1]: import pandas as pd
In [2]: import pymongo
In [3]: client = pymongo.MongoClient()
In [4]: collection = client['db_name']['collection_name']
In [5]: df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a', 'b', 'c'])
In [6]: df
Out[6]:
a b c
0 1 2 3
1 4 5 6
In [7]: rec = df.to_records()
In [8]: rec
Out[8]:
rec.array([(0, 1, 2, 3), (1, 4, 5, 6)],
dtype=[('index', '<i8'), ('a', '<i8'), ('b', '<i8'), ('c', '<i8')])
In [9]: type(rec)
Out[9]: numpy.recarray
but I faced some errors at insert
In [10]: collection.insert(rec)
raised
ValueError: no field of name _id
this
In [11]: collection.insert_many(rec)
raised
TypeError: documents must be a non-empty list
this
In [12]: collection.insert_one(rec)
raised
TypeError: document must be an instance of dict, bson.son.SON, or other type that inherits from collections.MutableMapping
Any idea?