1

I've been struggling to understand the Azure Cosmos DB official docs.

This link shows that change feed can work with the "MongoDB" : Change feed in Azure Cosmos DB

But the following paragraph is what is confusing me:

enter image description here

So my question is following can Azure Cosmos DB Change Feed work with MongoDB API (Not using change stream!)?

The documentation on this on the web is scarce and I can't find useful information on this subject.

To me it would make sense that change feed is supported since the Mongo API is just a layer over Cosmos DB but nothing would surprise me.

In case change stream is supported but not change feed, is it possible to define the oplog retention time or size, if not what are the defaults? (Again almost impossible information to find on the internet).

solujic
  • 924
  • 1
  • 18
  • 43

1 Answers1

1

can Azure Cosmos DB Change Feed work with MongoDB API (Not using change stream!)

Basically no.

It used to be at least unofficially possible by connecting to the SQL API endpoint but this is now blocked or unavailable on newly created Mongo accounts.

The documents surfaced when using the change feed via that route were the SQL API representations of BSON documents though and I don't think deserializing that was ever at all documented or supported.

It is unfortunate that this route has been closed off however because there is no replacement functionality provided out the box for using the CosmosDB "ChangeStreams" as a trigger source for an Azure function and parallelising across physical partitions (even though technically it could be implemented very similarly after calling "GetChangeStreamTokens").

My understanding is that change streams is just the Mongo API interface for exposing change feed though so it is basically implemented exactly the same way and has the same limitations. i.e. it has no separate oplog as it has no record of change history. It is just traversing the documents within a partition in order of LSN.

And because change feed does not currently expose deletes or guarantee all document versions (in the case that a document is written multiple times) neither can change streams - which is why it is required to specify $in: ["insert", "update", "replace"]).

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Hmm interesting, so you're saying that the change stream impl for cosmos db doesn't actually use oplog, how does the resumeToken work in that case if it does at all? And what would be the retention time for the events that happened, I understand it doesn't save history, but for example change stream process dies for a day or two, can it start off next day using resumeToken? – solujic May 17 '23 at 14:11
  • the resumeToken is just a bookmark of how far it has read per partition (maybe LSN?) - Hopefully when [this more enhanced change feed](https://stackoverflow.com/a/66361132/73226) finally comes available we will also see it surfaced in the change streams – Martin Smith May 17 '23 at 14:18
  • And yeah because it doesn't actually store change **events** at all you can set it to read from the beginning of time. But you need to understand that all it is doing is reading the documents in timestamp order and returning their current state at the time the document is read. So if your reading process dies for a day and a document is updated 5 times in that day when it resumes you will still pick up that document as changed as its last write time is later but you will only get the 5th version of it and miss the other 4 – Martin Smith May 17 '23 at 14:23
  • But this is also true for the change feed implementation too (not Mongo API), right? – solujic May 17 '23 at 14:28
  • Yes - as documented here https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed and because change streams is just piggy backing off that it has no additional functionality – Martin Smith May 17 '23 at 14:29
  • @solujic - From some experimentation you can enter the base 64 `_data` from the ResumeToken in https://codebeautify.org/gzip-decompress-online to see the plain text and it is just a gzipped Json array with a continuation token (LSN?) per partition – Martin Smith May 17 '23 at 15:04
  • Would then the lease collection use in change feed be any different from the resumeToken? When everything taken in consideration... – solujic May 17 '23 at 15:18
  • 1
    The lease collection isn't part of the change feed itself. This is something used by the change feed processor (client application consuming the change feed). It stores the continuation tokens per partition in leases. Certainly it would be very possible to write something similar for the Cosmos implementation of **change streams** but unfortunately the product team have not done so for us (unlike with change feed) - they have just exposed `GetChangeStreamTokens` that could technically be used for a similar process but no change streams processor that leverages that – Martin Smith May 17 '23 at 15:23