0

I'am trying to figure out a way to use Mongo as a circular buffer. Currently using SQL Lite but performance wise does not fit our case. The specifications need to be met are: The collection must empty itself every x seconds. The collection must empty itself when a limit of y documents is met.

Going through Mongo documentation, capped collections along with change events seem a way to go.

https://docs.mongodb.com/manual/core/capped-collections/

https://docs.mongodb.com/manual/reference/change-events/

In the documentation states: "Capped collections work in a way similar to circular buffers"

However I am not sure how to:

  1. Empty the collection every x seconds. Mongo TTL feature is not feasible since TTL isn't supported on capped collections.Other alternatives?
  2. Retrieve any "removed documents". Replace operation type of Change Events seems an aproach.Other alternatives?

Has anyone tried using Mongo as circular buffer? Is the above -Capped Collections/Change Events- the way to achive it?

Thanks for any response.

fdhsdrdark
  • 144
  • 1
  • 13
  • What is the size of your data? What is the range of "x seconds"? Capped collection **limits** the total number of documents, however you ask to remove **all** documents. a plain `db.collection.drop()` might be faster. In MongoDB a new collection is automatically created if it does not exist. – Wernfried Domscheit Jun 16 '21 at 09:19
  • Why do you like to retrieve the documents, when you actually ask to delete them? – Wernfried Domscheit Jun 16 '21 at 09:21
  • Can you elaborate what you expect from the "circular buffer". It's just a way to store data. It's how capped collections are implemented and it's being used for oplog in replica sets. So to answer your question if anybody tried to use it - yes some people tried. – Alex Blex Jun 16 '21 at 10:00
  • @WernfriedDomscheit I want to retrieve the documents since there is further processing that needs to take place after their removal. The current value of "x seconds" is 10secs but it's configurable. The data size is really big - close to half million docs per month, it'r production data and I do not know exact values. – fdhsdrdark Jun 16 '21 at 11:19
  • @AlexBlex What I expect from a circular buffer is to store specific number of documents(limit) having the ability to retrieve all documents that are automcaticaly removed when this limit is hit. As I realize now though, Mongo will actually removing documents one by one.. not all of them together in order to empty the collection.. – fdhsdrdark Jun 16 '21 at 11:27
  • This does not make much sense. Half a million documents per months gives app. 12 documents per minute. A circular buffer of 10 seconds basically means processing the documents one by one. And 12 documents per minute certainly does not cause any performance issues in SQLite! – Wernfried Domscheit Jun 16 '21 at 11:39
  • @WernfriedDomscheit As far SQLite goes, I've been told that in high traffic times inserts to SQLite start to perform purely. And I don't know the exact distribution of requests. Maybe there are half million requests during high traffic timees and in a montlhy basis something more than half millon. – fdhsdrdark Jun 16 '21 at 12:14
  • Despite SQLite looks a bit tiny, it is quite powerful, see https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite – Wernfried Domscheit Jun 16 '21 at 12:43

1 Answers1

2

From https://en.wikipedia.org/wiki/Circular_buffer:

a circular buffer [...] is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end.

I'm afraid the "Capped collections work in a way similar to circular buffers" you quoted uses precisely this definition of the circular buffer.

The capped collection is capped by size and/or number of document. The old documents are not removed by timer but by new documents. Think about it like the new documents overwrite the old ones.

Unfortunately this feature makes it impossible to delete documents from the collection https://docs.mongodb.com/manual/core/capped-collections/#document-deletion. Neither by TTL nor explicitly. And since there is no formal deletion, there is no deletion event in the change stream.

To put it simple, if you need to retrieve documents evicted from the buffer you need to implement it yourself.

TTL index may work for your, but it is time bound, not size bound. It will issue a deletion event to the changestream, but three are few things to consider:

  • you will need to maintain changestream client running to ensure you catch all events.
  • TTL index process comes with the cost. Every minute Mongodb runs the TTL Monitor thread to delete outdated documents. It consumes resources. Not as much as sqlite but still system performance may degrade and documents may not be deleted exactly after specified amount of time if it's busy with some other operations.

It would be advisable to take control and select/delete documents yourself. I understand you already have some implementation that uses sqlite, so it's just a matter of adjusting it to use mongodb instead.

db.collection.find({}).sort({_id:-1}).limit(1)

Will return you the oldest document. It uses default index and should perform well.

Alex Blex
  • 34,704
  • 7
  • 48
  • 75
  • Thanks very much, Indeed, the scenario I am on is not exactly on a circular buffer scope. Spent some time trying to figure out if triggers on Mongo is an option but not directly - so not an option after all. I tend to believe that "manually" handling the select/delete as you propose is the way to go. – fdhsdrdark Jun 16 '21 at 14:21