2

I have been attempting to insert the rows of a pandas dataframe into a mongodb collection as individual documents. I am pulling the data from MongoDB using pymongo, performing some transformations, running a scoring algorithm, and adding the score as an additional column to the dataframe. The last step will be to insert the rows into a special collection on the mongoDB database as individual documents but I am completely stuck. My example dataframe df looks like this.

    memberID                                       dxCodes  dxCount  score
0  856589080          [4280, 4293, 4241, 4240, 4242, 4243]        6    1.8 
1  906903383                                       [V7612]        1    2.6
2  837210554                           [4550, 4553, V1582]        3    3.1
3  935634391       [78791, 28860, V1582, 496, 25000, 4019]        6    1.1
4  929185103  [30500, 42731, 4280, 496, 59972, 4019, 3051]        7    2.8

MemberID is a string, dx codes would be an array (in MongoDB terminology), dxCount is an int, and score is a float. I have been toying around with a piece of code I found posted in response to a vaguely similar question.

import json
import datetime
df = pandas.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})
records = json.loads(df.T.to_json()).values()     
db.temp.insert_many(records) 

This is what I was able to get in my collection:

{
    "_id" : ObjectId("565a8f206d8bc51a08745de0"),
    "A" : NumberLong(1448753856695)
}

It's not much, but its as close as I have gotten. I have spent a lot of time googling and taking shots in the dark but haven't cracked it yet. Any guidance is greatly appreciated, thanks in advance for your assistance!

  • What is `df = pandas.DataFrame.from_dict({'A': {1: datetime.datetime.now()}})` doing? You are creating new `DataFrame` here. Also it gives different result not the one you show. – styvane Nov 29 '15 at 08:12
  • I got that code snippet from the accepted answer to this question: http://stackoverflow.com/questions/20167194/insert-a-pandas-dataframe-into-mongodb-using-pymongo the result I showed is the document that was inserted into the database after running that code snippet. did you get a different result? If so, I am curious what you got. After looking at it again, I see that it is creating a new dataframe which doesn't make sense. However, this was the accepted answer to the question I linked to above, although I don't see how it could have accomplished the OP's goal. I appreciate your feedback. – BigUglyDataScientist Nov 29 '15 at 13:19

1 Answers1

7

You need to convert your DataFrame to list of dictionary using the .to_dict() method.

>>> from pprint import pprint # to pretty print the cursor result.
>>> import pandas as pd
>>> import pymongo
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> memberID = ['856589080', '906903383', '837210554', '935634391', '929185103']
>>> dxCodes = [[4280, 4293, 4241, 4240, 4242, 4243], [7612], [4550, 4553, 1582],[78791, 28860, 1582, 496, 25000, 4019], [30500, 42731, 4280, 496, 59972, 4019, 3051]]
>>> dxCount = [6, 1, 3, 6, 7]
>>> score = [1.8, 2.6, 3.1, 1.1, 2.8]
>>> df = pd.DataFrame({'memberID': memberID, 'dxCodes': dxCodes, 'score': score})
>>> df
                                        dxCodes   memberID  score
0          [4280, 4293, 4241, 4240, 4242, 4243]  856589080    1.8
1                                        [7612]  906903383    2.6
2                            [4550, 4553, 1582]  837210554    3.1
3        [78791, 28860, 1582, 496, 25000, 4019]  935634391    1.1
4  [30500, 42731, 4280, 496, 59972, 4019, 3051]  929185103    2.8
>>> collection.insert_many(df.to_dict('records')) # you need to pass the 'records' as argument in order to get a list of dict.
<pymongo.results.InsertManyResult object at 0x7fcd7035d990>
>>> pprint(list(collection.find()))
[{'_id': ObjectId('565b189f0acf45181c69d464'),
  'dxCodes': [4280, 4293, 4241, 4240, 4242, 4243],
  'memberID': '856589080',
  'score': 1.8},
 {'_id': ObjectId('565b189f0acf45181c69d465'),
  'dxCodes': [7612],
  'memberID': '906903383',
  'score': 2.6},
 {'_id': ObjectId('565b189f0acf45181c69d466'),
  'dxCodes': [4550, 4553, 1582],
  'memberID': '837210554',
  'score': 3.1},
 {'_id': ObjectId('565b189f0acf45181c69d467'),
  'dxCodes': [78791, 28860, 1582, 496, 25000, 4019],
  'memberID': '935634391',
  'score': 1.1},
 {'_id': ObjectId('565b189f0acf45181c69d468'),
  'dxCodes': [30500, 42731, 4280, 496, 59972, 4019, 3051],
  'memberID': '929185103',
  'score': 2.8}]
>>> 
styvane
  • 59,869
  • 19
  • 150
  • 156