1

I need each document in a collection to be updated only if its content is different, regardless of the order of the elements in nested lists.

Fundamentally, two versions should be the same if the elements are identical regardless of their order. MongoDB does not do that, by default.

def upsert(query, update):
    # collection is a pymongo.collection.Collection object
    result = collection.update_one(query, update, upsert=True)
    print("\tFound match: ", result.matched_count > 0)
    print("\tCreated: ", result.upserted_id is not None)
    print("\tModified existing: ", result.modified_count > 0)

query = {"name": "Some name"}

update = {"$set": {
    "products": [
        {"product_name": "a"},
        {"product_name": "b"},
        {"product_name": "c"}]
}}
print("First update")
upsert(query, update)

print("Same update")
upsert(query, update)

update = {"$set": {
    "products": [
        {"product_name": "c"},
        {"product_name": "b"},
        {"product_name": "a"}]
}}
print("Update with different order of products")
upsert(query, update)

Output:

First update
Found match:  False
Created:  True
Modified existing:  False

Same update 
Found match:  True
Created:  False
Modified existing:  False

Update with different order of products
Found match:  True
Created:  False
Modified existing:  True

The last update does modify the document because the order of products are indeed different.

I did find a working solution which is to compare a sorting of the queried document's content and a sorting of the new one.

Thanks to Zero Piraeus's response for the short and convenient way to sort for comparison.

def ordered(obj):
    if isinstance(obj, dict):
        return sorted((k, ordered(v)) for k, v in obj.items())
    if isinstance(obj, list):
        return sorted(ordered(x) for x in obj)
    else:
        return obj

I apply it to compare the current and the new versions of the document. If their sorting are different, I apply the update.

new_update = {
    "products": [
        {"product_name": "b"},
        {"product_name": "c"},
        {"product_name": "a"}]
}

returned_doc = collection.find_one(query)
# Merging remote document with local dictionary
merged_doc = {**returned_doc, **new_update}
if ordered(returned_doc) != ordered(merged_doc):
    upsert(query, {"$set": new_update})
    print("Updated")
else:
    print("Not Updated")

Output:

Not Updated

That works, but that relies on python to do the comparison, introducing a delay between the read and the write.

Is there a way to do it atomically ? Or, even better, a way to set a MongoDB Collection to adopt some kind of "order inside arrays doesn't matter" mode ?

This is part of a generic implementation. Documents can have any kind of nesting in their structure.

Whole Brain
  • 2,097
  • 2
  • 8
  • 18
  • Just curious ... _"... two versions should be the same if the elements are identical regardless of their order."_ Is there any functional or performance reason to not let the document be updated with a different ordered array? – rickhg12hs Jan 13 '23 at 13:47
  • 1
    Yes. The source of the data is unordered and I'm using a versioning pattern for the updates. I want to minimize unecessary updates as the revision collection will grow much faster and potentially trigger alerts. – Whole Brain Jan 13 '23 at 13:50
  • Will your `query` identify a unique document in the collection if it exists? – rickhg12hs Jan 13 '23 at 13:52
  • 1
    Yes. There is a unique field known prior to the whole process. – Whole Brain Jan 13 '23 at 13:53
  • 1
    @rickhg12hs You made me think differently about this constraint. I can delegate the comparison to the versioning part, indeed. The insert of the versioned document will be conditioned on the comparison between the document prior to the update (using `find_one_and_update` and `ReturnDocument.BEFORE`) and the new version in memory. This way the update will be unconditioned, quick and atomic, while python has all the time to update the other lower-priority collection. – Whole Brain Jan 13 '23 at 14:14

2 Answers2

2

EDIT: According to your comment (and the resemblance to other question) I suggest:

db.collection.updateMany(
  {"name": "Some name"},
  [{
    $set: {products: {
        $cond: [
          {$setEquals: [["a", "c", "b"], "$products.product_name"]},
          "$products",
          [{"product_name": "a"}, {"product_name": "c"}, {"product_name": "b"}]
        ]
    }}
  }]
)

See how it works on the playground example

Original answer:

One option is to use the query part of the update to work only on documents that are matching your condition:

db.collection.update(
  {"name": "Some name",
   "products.product_name": {
    $not: {$all: ["a", "b", "c"]}}
  },
  {"$set": {
    "products": [
      {"product_name": "b"},
      {"product_name": "c"},
      {"product_name": "a"}
    ]
  }}
)

See how it works on the playground example

nimrod serok
  • 14,151
  • 2
  • 11
  • 33
  • Thank you. I understand that setting the condition in the query will make the operation atomic. The query seems to be very complicated to build though. In your solution, removing a product from the list won't trigger the update. Your overall proposal seems to be the way to go but I can't tell for sure if it's really doable. – Whole Brain Jan 13 '23 at 13:47
  • Looks like your first mongoplayground.net link is to a different example. – rickhg12hs Jan 13 '23 at 17:53
  • 1
    If you use `{'$ifNull': ['$products.product_name', []]}`, it should handle `upsert` too. – rickhg12hs Jan 14 '23 at 06:36
2

@nimrodserok correctly pointed out a flaw in my first answer. Here's my updated answer that's a slight variation of his answer.

This should also preserve the upsert option.

new_new_update = [
  {
    "$set": {
      "products": {
        "$let": {
          "vars": {
            "new_products": [
              {"product_name": "b"},
              {"product_name": "c"},
              {"product_name": "a"}
            ],
            "current_products": {
              # need this for upsert
              "$ifNull": ["$products", []]
            }
          },
          "in": {
            "$cond": [
              {"$setEquals": ["$$current_products", "$$new_products"]},
              "$$current_products",
              "$$new_products"
            ]
          }
        }
      }
    }
  }
]

Here's a mongoplayground.net example to demonstrate the concept. You can change the "name" value to verify the upsert option.

I'm curious what the result values are for this update_one.

rickhg12hs
  • 10,638
  • 6
  • 24
  • 42
  • your solution, as my original solution, does not cover the [case where the new array is smaller than the array on the document](https://mongoplayground.net/p/SpT4d2gMsL1)... – nimrod serok Jan 13 '23 at 16:50
  • @nimrodserok You're right! – rickhg12hs Jan 13 '23 at 17:51
  • 1
    @nimrodserok I updated my answer. _“Imitation is not just the sincerest form of flattery - it's the sincerest form of learning.”_ ― George Bernard Shaw – rickhg12hs Jan 13 '23 at 22:55