Efficient archiving monthly older than 1 year old for distributed Mongo DB

Question

We have a distributed /shared mongo db (5 to 6 replica). I want to back up huge tables data if older than 1 year old (ie). And delete after backup. (I dont want any data loss, so i dont want to delete a collection completely) Source collection is customer (ie) Target collection is customer_archive

Note: Archived data should not be lost and should not have duplication same data in archived data . I assumed we need merge/upser mechanism. But dont know best way to do.

what is the most efficient way to archive over 200-300 gb data (ie) (milions data everyday i mean growing fast, so querying getting difficult and performans )
Archive collection might be in different database, how to write script?
How can i schedule that script monthly/daily?
How woud you do that if backup will be different database (i thought first export and move to another db)

My perspective was, first taking dump collection different name. And copy and delete yearly data by bulk Insert /bulk delete (read each 10.000 foreach commit). Some do not recommend foreach for efficiency but i assume the only way for partial deletion. I dont know if i can make a batch file ,so i can share it with our BT team for as task scheduling.

Some general information about [MongoDB Backup Methods](https://docs.mongodb.com/manual/core/backups/). — prasad_, Jun 17 '21 at 09:09
See https://stackoverflow.com/a/67077465/3027266 Scheduling can be done by simple cron job or Windows Scheduler. Deleting old data can be the bottleneck, is it possible to drop the entire collection? — Wernfried Domscheit, Jun 17 '21 at 09:11
See stackoverflow.com/a/67077465/3027266 Scheduling can be done by simple cron job or Windows Scheduler. Deleting old data can be the bottleneck, however 20-30 GB is not "huge" nowadays. Is it possible to drop the entire collection? — Wernfried Domscheit, Jun 17 '21 at 09:19
i dont want to delete current collection, it is in use. And we also want to seperate old data as a archive. Assume you have 10 years data and huge data, select a query is so costing. So i want to archive old data , only keep last one year. But i do not want to lose any data. I gave numbers as an example by the way, assume that we have 300 GB to TB data. What would you do :) — Tolga Celik, Jun 17 '21 at 09:48
a little at a time, a months worth of data is probably a lot easier to handle, and deleting it will be less impactful on the database. That is, to whittle down that 10 years to 1 year, use 108 month-size operations instead of a single 9-year-sized operation. — Joe, Jun 17 '21 at 11:24

Efficient archiving monthly older than 1 year old for distributed Mongo DB

0 Answers0