I have a list of dataframes. I want to store these in mongodb and retrieve it.
I tried storing:
for every_df in dfs:
records = json.loads(every_df.to_json()).values()
db_connection.insert(records)
but i'm not sure how to retrieve.
I have a list of dataframes. I want to store these in mongodb and retrieve it.
I tried storing:
for every_df in dfs:
records = json.loads(every_df.to_json()).values()
db_connection.insert(records)
but i'm not sure how to retrieve.
Check out odo. You can do each of these operations (append and retrieve) in a single line, even when you have multiple DataFrame
s. Here's an example:
In [1]: from odo import odo, chunks, resource
In [2]: dfs = (pd.DataFrame({'a': [1, 2, 3], 'b':list('abc')}),
...: pd.DataFrame({'a': [2, 3, 4], 'b':list('def')}))
In [3]: dfs
Out[3]:
( a b
0 1 a
1 2 b
2 3 c, a b
0 2 d
1 3 e
2 4 f)
In [4]: db = resource('mongodb://localhost/mydb')
In [5]: coll = odo(chunks(pd.DataFrame)(dfs), db.mycollection)
In [6]: list(coll.find())
Out[6]:
[{u'_id': ObjectId('55520638362e690439f13dfb'), u'a': 1, u'b': u'a'},
{u'_id': ObjectId('55520638362e690439f13dfc'), u'a': 2, u'b': u'b'},
{u'_id': ObjectId('55520638362e690439f13dfd'), u'a': 3, u'b': u'c'},
{u'_id': ObjectId('55520638362e690439f13dfe'), u'a': 2, u'b': u'd'},
{u'_id': ObjectId('55520638362e690439f13dff'), u'a': 3, u'b': u'e'},
{u'_id': ObjectId('55520638362e690439f13e00'), u'a': 4, u'b': u'f'}]
In [7]: whole_df = odo(coll, pd.DataFrame)
In [8]: whole_df
Out[8]:
a b
0 1 a
1 2 b
2 3 c
3 2 d
4 3 e
5 4 f
You can use MongoClient
from pymongo
and to_dict
from pandas
.
I'll show the simple case.
Necessary modules
import pandas as pd
from pymongo import MongoClient
create dummy dataframe
df = pd.DataFrame({'A': ['r,'a','n'],
'Z': ['d','o','m']})
convert dataframe into a python list of dicts using to_dict
Why not to_json
? I've found to_dict
handles datetime objects better more consistently.
data = df.to_dict(orient='records')
create mongodb connector.
cur = MongoClient('mongodb://localhost:27017/')['yourDATABASE']['yourCOLLECTION'] # assume local instance
next we use insert_many
with the list of dicts
cur.insert_many(data)
finally we use find
to retrieve data from 'yourCOLLECTION'
as a cursor object.
result = cur.find({})
which we loop through with a list comprehension to extract out data as a list of dicts.
result = [r for r in result]
Note: Most of the MongoClient
collection operators use mongodb simple query format, with all keys as str
.