0

When POSTing (from Postman), I end up with duplicate documents with the same "_id". Looking at the examples here and here, I'm wondering if my issue is that something is not set up correcly in my Cosmos DB instance? Example:

if I update my shape's color to orange, I get another document with the key of 1 but what I'm expecting is to see a single document with the key of 1 with a shape that has the color orange.

Function:

public static void Run(
  ILogger logger,
  [EventGridTrigger] EventGridEvent e,
  [CosmosDB(
      databaseName: "myDatabase",
      collectionName: "myCollection",
      ConnectionStringSetting = "COSMOS_CONNECTION_STRING")] out MyObject myObjectDocument
  )
  {
      logger.LogInformation("Event received {type} {subject}", e.EventType, e.Subject);

      myObjectDocument = JsonConvert.DeserializeObject<MyObject>(e.Data.ToString());

      logger.LogInformation(myObjectDocument.some.thing);
}

Payload:

[
  {   
      "topic": "Topic",
      "id": "1",   
      "eventType": "EventType",   
      "subject": "Subject",   
      "eventTime": "2012-08-10T21:04:07+00:00",
      "data" : {
        "id" : 1,
        "effectiveDate" : "2020-10-18 15:00:00",
        "shape" : {
            "_id" : "1000",
            "color" : "green",
            "name" : "square"
        }
      },
       "dataVersion": "2.0",   
       "metadataVersion": "1"
  }
]

Edits: Partition Key is "id" Matias Quaranta's answer and comments did the trick. Note also the partition key must be a string and not an int

user94614
  • 511
  • 3
  • 6
  • 26
  • 1
    Please edit to provide more details. For example: Your Azure Function doesn't really do anything - not quite sure why it's included. And you haven't included any details on your upsert operation. Also, what is your collection's partition key? – David Makogon Dec 31 '21 at 13:58
  • I don't understand, I thought the Cosmos DB Output Binding handled the Upsert operation? Or is that only with Azure Functions 1.x? See [this](https://stackoverflow.com/questions/54887882/azure-function-inserting-but-not-updating-cosmosdb) and [this](https://stackoverflow.com/questions/52437913/azure-function-c-create-or-replace-document-in-cosmos-db-on-http-request) and [this](https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-cosmosdb?tabs=csharp) – user94614 Jan 03 '22 at 13:28
  • 1
    `_id` is the document identifier for Mongo. Cosmos DB's SQL API uses `id`. You are getting "duplicates" because you are relying on `_id` matching, which will never happen. – Matias Quaranta Jan 03 '22 at 16:02

2 Answers2

2

_id is the document identifier for Mongo.

Cosmos DB SQL API uses id. You are getting "duplicates" because you are relying on matching the value of _id, and you can have millions of documents with the same _id but different id.

The document identity in SQL API is the value of id and the Partition Key. If your container has the Partition Key Definition as /myPK (for example, yours can be different) then the identity of the document is the value of id and myPK (or whichever is your Partition Key Definition) properties. When calling Upsert if a document with the same id and myPK values exists, it will be updated, if not, then a new document will be created with the body.

Matias Quaranta
  • 13,907
  • 1
  • 22
  • 47
  • Thanks for your response. So my partition key is `_id` and what I'm reading is that I need to change it? If that is correct, would I be able to change it to `id` or will that lead to conflicts elsewhere? – user94614 Jan 03 '22 at 16:46
  • If your partition key is `_id` then the identity of your documents are the value of `id` and `_id`. It sounds like based on your post, you really want to use `_id` as the identifier of the document. Maybe the best course of action is stop using `_id` in your documents and instead use `id` (because its what you are trying to do conceptually) and have the container have `/id` as partition key. – Matias Quaranta Jan 03 '22 at 16:56
  • 1
    Changing the Partition Key Definition to `/id` but still sending the same payload that you are currently sending using `_id` as your document identifier won't solve your problem. The problem is conceptual in your design, you are conceptually trying to use `_id` to perform updates or creates and that works well in a Mongo database, but on Cosmos SQL API you need to use `id` to achieve the same goal. Your documents should contain `id` instead of `_id`, either the Event Grid data should have `id` or you could generate an `id` property that matches whatever is coming as `_id` in the Function. – Matias Quaranta Jan 03 '22 at 17:00
  • I think I'm about there...I've changed my partition key (to `id`) and edited my payload. I'm now seeing Cosmos use the auto generated Id (guid) for the partition key instead of a value I specify in the payload. Is there something else I need to change or update? – user94614 Jan 03 '22 at 17:58
  • 1
    The Cosmos DB SDK will autogenerate the `id` property if your document does not have it. This most likely mean your document does not have `id` property and it is still passing `_id`. – Matias Quaranta Jan 03 '22 at 18:08
  • 1
    It looks like I had to pass my partition key as a `string` instead of an `code`. Thanks again for your responses! – user94614 Jan 03 '22 at 20:18
0

Items in cosmo DB can only be unique when Id and partition key are the same. If the partition key is different, but the Id is the same, the item is not unique.

Aram Yako
  • 41
  • 1
  • 5