1

I have a lot of documents to update and I want to write a timestamp initially and then an update timestamp when there are duplicates. So I found this answer and am attempting it for MongoDB 6.0 https://stackoverflow.com/a/17533368/3300927

I also store in my model the variable to use when looking for duplicates as searchable

If a query has no searchable then I insert it without checking and add a timestamp, then take the results and add a timestamp:

data_inserted = collection.insert_many(results)
for doc_id in data_inserted.inserted_ids:
    collection.update_many(
        filter={'_id': doc_id},
        update={'$set': {'insert_date': now, }, },
        upsert=True)

No issues there:

{
  "_id": {
    "$oid": "321654987654"
  },
  "IR NUMBER": "ABC784",
  "Plate": " ",
  "Plate State": " ",
  "Make": "TOYOTA",
  "Model": "TACOMA",
  "Style": "  ",
  "Color": "SIL /    ",
  "Year": "2008",
  "insert_date": {
    "$date": {
      "$numberLong": "1660000808176"
    }
  }
}

If there is a searchable I attempt to look for it. What I get in MongoDB is only the searchable field with the timestamp:

# q_statement.searchable == 'IR NUMBER'

for document in results:
    collection.update_one(
        filter={q_statement.searchable: document[q_statement.searchable], },
        update={'$setOnInsert': {'insert_date': now, }, '$set': {'update_date': now, }},
        upsert=True)

result:

{
  "_id": {
    "$oid": "62f19d981aa321654987"
  },
  "IR NUMBER": "ABC784",
  "insert_date": {
    "$date": {
      "$numberLong": "1660001688126"
    }
  }
}

EDIT

Looking at the pymongo.results.UpdateResult by changing the for loop contents to updates = collection.update_one( ... print(updates.raw_result) shows ~ 10k results like:

  {
    "n": 1,
    "upserted": ObjectId("62f27ae21aa62fbfa734f01d"),
    "nModified": 0,
    "ok": 1.0,
    "updatedExisting": False
  },
  {
    "n": 1,
    "nModified": 0,
    "ok": 1.0,
    "updatedExisting": True
  },
  {
    "n": 1,
    "nModified": 0,
    "ok": 1.0,
    "updatedExisting": True
  }

(python==3.10.3, Django==4.0.4, pymongo==4.2.0)

rickhg12hs
  • 10,638
  • 6
  • 24
  • 42
Bammer
  • 15
  • 6
  • `collection.update_one` returns a[`pymongo.results.UpdateResult`](https://pymongo.readthedocs.io/en/stable/api/pymongo/results.html#pymongo.results.UpdateResult). Perhaps if you printed its properties for each `update_one`, the behavior would be more clear. – rickhg12hs Aug 09 '22 at 13:46
  • Changed the for loop to --- updates = collection.update_one( ... print(updates.raw_result) --- and get 10k lines like this: --- {'n': 1, 'upserted': ObjectId('62f27ae21aa62fbfa734f01d'), 'nModified': 0, 'ok': 1.0, 'updatedExisting': False} {'n': 1, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True} – Bammer Aug 09 '22 at 15:21
  • I think I'm missing something. If `results` was initially inserted into `collection`, based on your `filter` search criteria, how is it possible for an `upsert` to happen later? Wouldn't there always be a match? ... or is another user/process also updating the collection that would delete/modify the initial `results` insertion? – rickhg12hs Aug 09 '22 at 16:40
  • I'm pulling joining data from MSSQL and putting it into MongoDB. The MSSQL statements are in the Django Query Model ( as q_statement). If there is a q_statement.searchable it uses the update_one for loop and filter code in question every time to insert and update. If there is no q_statement.searchable, then it uses the insert_many which has no issues (but will make duplicates next time it runs). Do you think I shouldn't enter a searchable till after doing the insert_many the first time and the update thereafter? I was hoping to write it to handle an initial write and subsequent updates. – Bammer Aug 09 '22 at 18:23
  • I think I understand a bit better now. Thanks for the explanation. I do have some more questions though. Do you expect `searchable` to be unique in the collection? If not, won't `update_one` randomly select a document to update? I believe the reason for the `searchable` documents being so sparse on insertion is because the `update_one` doesn't reference anything other than `searchable` in `document` (from `results`). One of the options that I would consider is to make all modifications to `results` in `python` first, and then do the insertion/update. – rickhg12hs Aug 09 '22 at 19:58
  • The `update_one` creates a new document for each for loop (as expected) but only entered the IR and date field, not the other vehicle info(not expected). What I want is. If the document exists, change `update_date` to now. else create a new document with ALL the fields. and I have no idea why all the fields aren't coming. – Bammer Aug 09 '22 at 22:04
  • I don't completely understand your update strategy, but you might try `update={'$setOnInsert': document | {'insert_date': now}, '$set': {'update_date': now, }}`. – rickhg12hs Aug 10 '22 at 03:32
  • 1
    That includes that document and the date fields... Thank you! – Bammer Aug 10 '22 at 18:49

1 Answers1

1

To "upsert" a full document and additional fields using python, you can use MongoDB's "$setOnInsert" with a python merged dictionary.

From the python library docs, here's how you merge dictionaries. (It's similar to MongoDB's "$mergeObjects".)

d | other

Create a new dictionary with the merged keys and values of d and other,
which must both be dictionaries. The values of other take priority
when d and other share keys.

So, to insert the full document, using your python code, it just needs a minor addition - merge document with your other object.

...
update={'$setOnInsert': document | {'insert_date': now}, '$set': {'update_date': now, }}
...
rickhg12hs
  • 10,638
  • 6
  • 24
  • 42