Description
I am working on an ASP.NET Core 3.1 web application which needs to track/respond on changes made to the MongoDB database hosted by Azure Cosmos DB (version 3.6). For this purpose I am using the Change feed support.
The changes are pretty frequent: ~10 updates per second on a single entry in a collection.
In order to track down changes made on the collection, I am dumping the affected entries to a file (this is just for testing purposes) with the following piece of code.
private async Task HandleChangeStreamAsync<T>(IMongoCollection<T> coll, StreamWriter file, CancellationToken cancellationToken = default)
{
var pipeline = new EmptyPipelineDefinition<ChangeStreamDocument<T>>()
.Match(change => change.OperationType == ChangeStreamOperationType.Insert ||
change.OperationType == ChangeStreamOperationType.Update ||
change.OperationType == ChangeStreamOperationType.Replace)
.AppendStage<ChangeStreamDocument<T>, ChangeStreamDocument<T>, ChangeStreamOutputWrapper<T>>(
"{ $project: { '_id': 1, 'fullDocument': 1, 'ns': 1, 'documentKey': 1 }}");
var options = new ChangeStreamOptions
{
FullDocument = ChangeStreamFullDocumentOption.UpdateLookup
};
using (var cursor = await coll.WatchAsync(pipeline, options, cancellationToken))
{
await cursor.ForEachAsync(async change =>
{
var json = change.fullDocument.ToJson(new JsonWriterSettings { Indent = true });
await file.WriteLineAsync(json);
}, cancellationToken);
}
}
Issue
While observing the output, I have noticed that the change feed was not triggered for every update that was made to the collection. I can confirm this by comparing the output generated against the database hosted by MongoDB Cloud.
Questions
How reliable is change stream support in Azure Cosmos DB’s API for MongoDB?
Can the API guarantee that the most recent update will always be available?
I was not able to process the 'oplog.rs' collection of the 'local' database on my own, does the API support this in any way? Is this even encouraged?
Is the collection throughput (RU/s) in some way related to the change event frequency?
Final thoughts
My understanding is that frequent updates throttle the system and the change feed simply does not handle all of the events from the log (rather scans it periodically). However, I am wondering how safe it is to rely on such mechanism and be sure not to miss any critical updates made to the database.If change feed support cannot make any guarantees regarding event handling frequency and there is no way to process 'oplog.rs', the only option seems to be periodic polling of the database.
Correct me if I am wrong, but switching to polling would greatly affect the performance and would lead to a solution which is not scalable.