how to choose best mechanism for delete logs saved to mongodb

Question

I'm implementing a logger using MongoDB and I'm quite new to the concept. The logger is supposed to log each request and Its response. I'm facing the question of using the TTL Index of mongo or just using the query overnight approach.

I think that the first method might bring some overhead by using a background thread and probably rebuilding the index after each deletion but, it frees space as soon as the documents expire and this might be beneficial. The second approach, on the other hand, does not have this kind of overhead but it frees up space just at the end of each day.

It seems to me that the second approach will suit my case better as it would not be the case that my server just goes on the edge of not having enough disk space, but it will always be the case that we need to reduce the overhead on the server.

I'm wondering if there are some aspects to the subject that I'm missing and also I'm not sure about the applications of the MongoDB TTL.

R2D2 · Accepted Answer · 2022-02-08T18:57:38.063

Just my opinion:

It seems to be best to store logs in monthly , daily or hourly collection depends on your applications write load , and at the end of the day to just drop() the oldest collections with custom script. From experience TTL indices not working well when there is heavy write load to your collection since they add additional write load based on expiration time.

For example imagine you insert at 06:00h log events with 100k/sec and your TTL index life time is set to 3h , this mean after 3h at 09:00h you will have those 100k/sec deletes applied to your collection that are also stored in the oplog ... , solution in such cases is to add more shards , but it become kind of expensive... , far easier is to just drop the exprired collection ...

Moreover depending on your project size for bigger collections to speed up searches you can additionally shard and pre-split the collections based on compound index hashed datetime field(every log contain timestamp) with another field which you will search often and this will allow you scalable search across multiple distributed shards.

Also note mongoDB is a general purpose document database and fulltext search is kind of limited to expensinve regex expressions , so in case you need to do fast raw fulltext search in your logs some inverse index search engine like elasticsearch on top of your mongoDB backand maybe a good solution to cover this functionality.

how to choose best mechanism for delete logs saved to mongodb

1 Answers1

Linked