I need to write CSV data into MongoDB (version 4.4.4). Right now I'm using MongoEngine as a data layer for my application.
Each CSV has at least 4 million records and 8 columns.
What is the fastest way to bulk insert (if the data doesn't exist yet) or update (if the data is already in the collection)?
Right now I'm doing the following:
for inf in list:
daily_report = DailyReport.objects.get(company_code=code, date=date)
if daily_report is not None:
inf.id = daily_report.id
inf.save()
- The list is a list of DailyReports built from the CSV data.
- The _id is auto-generated. However, for business purposes the primary keys are the variables company_code (StringField) and date (DateTimeField).
- The DailyReport class has an unique compound index made of the following fields: company_code and date.
- The previous code traverse through the list and for each DailyReport it looks for an existing DailyReport in the database with the same company_code and date. If so, the id from the DailyReport in the database is assigned to the DailyReport built from the CSV data. With the id assigned to the object, the object is saved using the Object.save() method from MongoEngine.
- The insert operation is being done at each object and is incredibly slow.
Any ideas on how to make this process it faster?