5

With version 4, MongoDB change streams can use two distinct parameters for specifying where to recover the change stream: resumeAfter (some internal token) and startAtOperationTime, a timestamp type.

Is it possible to completely replace resumeAfter with startAtOperationTime for a safe recovery of change streams by using the clusterTime found in every change event?

What I am particularly concerned about and where I couldn't find exact information in the documentation is whether for startAtOperationTime same rules and guarantees apply for what can be resumed and for how long. Is the operation time used here persisted correctly and can it always be used as a replacement for the document token usually used for resumeAfter?

languitar
  • 6,554
  • 2
  • 37
  • 62
  • Is the [documentation not explicit enough?](https://docs.mongodb.com/manual/reference/method/db.collection.watch/) *"resumeAfter is mutually exclusive with startAtOperationTime."*. Which means you **either use one or the other** but **never both**. Also *"The starting point for the change stream. If the specified starting point is in the past, it must be in the time range of the oplog. To check the time range of the oplog, see `rs.printReplicationInfo()`.". So essentially **anything** as long as it's within the current oplog range. Not sure what else there is to answer. – Neil Lunn Mar 16 '19 at 06:35
  • @NeilLunn thanks for the explanation. I know that they are mutually exclusive, but I don't know whether the same guarantees apply regarding what can be resumed. I will update the question to make this more clear. – languitar Mar 17 '19 at 15:43
  • It might not have stood out, but what I gave you there in the comment in between the `""` was actually a quotation from the manual page, and linked. There is no **guarantee** with either form. You are either asking for something that is presently in the oplog or you are not, hence the recommendation to "check the time range" as stated directly in the documentation. – Neil Lunn Mar 17 '19 at 20:51
  • Ok, stated differently: are the chances higher to recover correctly with one of the two alternatives (for instance thinking about clock shifts or effects like that). – languitar Mar 18 '19 at 08:31

1 Answers1

9

Is the operation time used here persisted correctly and can it always be used as a replacement for the document token usually used for resumeAfter?

Which of the two to use, depends on your use case.

The two options, resumeAfter and startAtOperationTime, are quite similar with subtle differences:

  • startAtOperationTime takes a timestamp. While resumeAfter takes the entire _id of a Change Stream event document.
  • startAtOperationTime can resume notifications after an invalidate event by creating a new change stream. While resumeAfter unable to resume a change stream after an invalidate event closes the stream.
  • startAtOperationTime resumes changes that occurred at or after the specified timestamp. While resumeAfter resumes changes immediately after the provided token.

Whichever one you choose, either token or timestamp should be within the Replica Set Oplog window time. Change stream relies on MongoDB global logical clock (cluster time) which is sync'd with the distributed oplog, so either options are using the same underlying technology.

Worth noting if you would like to start watching a collection and processing existing entries within the collection, you can specify startAtOperationTime with a constructed timestamp. It would be harder to do this with resumeAfter, as it requires a token that originates from _id of an event.

Also, new in MongoDB v4.2 there is a new option startAfter which takes an _id from an event, and resumes a change stream after the operation specified in the resume token. In addition, it allows notifications to resume after an invalidate event much like startAtOperationTime.

You may also find the compatibility table between resume tokens on MongoDB versions useful

Wan B.
  • 18,367
  • 4
  • 54
  • 71
  • Thanks for this detailed explanation. That finally clarifies my question. Would be nice to see such a detailed explanation also in the official docs. – languitar Aug 24 '19 at 16:59