1

I have a list of dataframes. I want to store these in mongodb and retrieve it.

I tried storing:

for every_df in dfs:
   records = json.loads(every_df.to_json()).values()
   db_connection.insert(records)

but i'm not sure how to retrieve.

Harshini
  • 459
  • 1
  • 7
  • 23
  • have you looked at this: http://stackoverflow.com/questions/16249736/how-to-import-data-from-mongodb-to-pandas?rq=1 – EdChum May 12 '15 at 12:31
  • ya i have but not sure how to use it in my case! what does the query mean in the function?? – Harshini May 12 '15 at 12:34
  • Well I'm not a mongo expert but you have several options include direct querying, exporting to json or csv and loading – EdChum May 12 '15 at 12:37

2 Answers2

3

Check out odo. You can do each of these operations (append and retrieve) in a single line, even when you have multiple DataFrames. Here's an example:

In [1]: from odo import odo, chunks, resource

In [2]: dfs = (pd.DataFrame({'a': [1, 2, 3], 'b':list('abc')}),
   ...:        pd.DataFrame({'a': [2, 3, 4], 'b':list('def')}))

In [3]: dfs
Out[3]:
(   a  b
 0  1  a
 1  2  b
 2  3  c,    a  b
 0  2  d
 1  3  e
 2  4  f)

In [4]: db = resource('mongodb://localhost/mydb')

In [5]: coll = odo(chunks(pd.DataFrame)(dfs), db.mycollection)

In [6]: list(coll.find())
Out[6]:
[{u'_id': ObjectId('55520638362e690439f13dfb'), u'a': 1, u'b': u'a'},
 {u'_id': ObjectId('55520638362e690439f13dfc'), u'a': 2, u'b': u'b'},
 {u'_id': ObjectId('55520638362e690439f13dfd'), u'a': 3, u'b': u'c'},
 {u'_id': ObjectId('55520638362e690439f13dfe'), u'a': 2, u'b': u'd'},
 {u'_id': ObjectId('55520638362e690439f13dff'), u'a': 3, u'b': u'e'},
 {u'_id': ObjectId('55520638362e690439f13e00'), u'a': 4, u'b': u'f'}]

In [7]: whole_df = odo(coll, pd.DataFrame)

In [8]: whole_df
Out[8]:
   a  b
0  1  a
1  2  b
2  3  c
3  2  d
4  3  e
5  4  f
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • odo also mentioned [here](https://stackoverflow.com/a/34483020/4549682), but they said it has problems with non alpha username, password. Is this true or already fixed? – wordsforthewise Oct 11 '17 at 05:28
1

You can use MongoClient from pymongo and to_dict from pandas.
I'll show the simple case.

Necessary modules

import pandas as pd
from pymongo import MongoClient

create dummy dataframe

df = pd.DataFrame({'A': ['r,'a','n'], 
                   'Z': ['d','o','m']})

convert dataframe into a python list of dicts using to_dict
Why not to_json? I've found to_dict handles datetime objects better more consistently.

data = df.to_dict(orient='records')

create mongodb connector.

cur = MongoClient('mongodb://localhost:27017/')['yourDATABASE']['yourCOLLECTION'] # assume local instance

next we use insert_many with the list of dicts

cur.insert_many(data)

finally we use find to retrieve data from 'yourCOLLECTION' as a cursor object.

result = cur.find({})

which we loop through with a list comprehension to extract out data as a list of dicts.

result = [r for r in result]

Note: Most of the MongoClient collection operators use mongodb simple query format, with all keys as str.

fielc92
  • 815
  • 10
  • 12