0

I have been able to insert data from a csv into a MongoDB by using using PyMongo in the below code.

from pymongo import MongoClient
import urllib
import pandas as pd
import time
import json

client = MongoClient()
db = client.MainDB
col = db.Test


def csv_to_json(filename, header=0):
    data = pd.read_csv(filename, header=header, error_bad_lines=False, warn_bad_lines=False, sep='|', low_memory=True)
    return json.loads(data.to_dict(orient='records'))

try: 
    col.insert_many(csv_to_json('main.csv'))
except Exception as e:
    print(e)

Now, I will have to be updating this collection daily with the same csv but with different values for certain fields. This is what i came up with and it didnt work. How do i go about this, please.

from pymongo import MongoClient
import urllib
import pandas as pd
import json
import time


starttime = time.time()
client = MongoClient()
db = client.MainDB
col = db.Test


def csv_to_json(filename, header=0):
    data = pd.read_csv(filename, header=header, error_bad_lines=False, warn_bad_lines=False, sep='|', low_memory=True)
    return data.to_dict(orient='dict')

try:
    col.update({}, csv_to_json('main.csv'),upsert=True)
except Exception as e:
    print(e)
Manuel
  • 32
  • 1
  • 2
  • What about using the [mongoimport](https://docs.mongodb.com/database-tools/mongoimport/) tool? Should be much more efficient. – Wernfried Domscheit Mar 26 '21 at 12:39
  • @WernfriedDomscheir Can I use mongoimport tool in a python script? – Manuel Mar 26 '21 at 12:48
  • 1
    I would assume you can call an arbitrary executable in Python: https://www.newbedev.com/python/howto/6-ways-to-call-external-command-in-python/ or https://stackoverflow.com/questions/89228/how-to-execute-a-program-or-call-a-system-command-from-python – Wernfried Domscheit Mar 26 '21 at 13:59

1 Answers1

0

You can use db.collect.bulk_write() in conjugation with update_many() (if you want to perform multiple actions, otherwise update_many() alone would do) to update the values.

Here's an example as to how you can use bulk_write.

For update_many(), you could have something like this:

import pymongo

client = pymongo.MongoClient("mongodb://localhost:1001/")
db = client["mydatabase"]
collect = mydb["customers"]

my_query = { "address": { "$regex": "ABC[0-9]" } }
new_values = { "$set": { "name": "DEF" } }

x = collect.update_many(myquery, newvalues)

The documentation for it can be found here.

MetaInformation
  • 295
  • 2
  • 8
  • Thanks for your reply. I made this update: col.update_many({}, csv_to_json('main.csv'), upsert=True) And i had this error: 'update' command document too large My csv is about 100MB – Manuel Mar 26 '21 at 13:20