Background :
I have created a job which is reading data from MongoDB and loading that to MS-SQL.
Current Behaviour :
Whenever I run the job it is fetching all the data from MongoDB .
Expected Behaviour :
When the job will run it should fetch only the data which is not loaded yet. I have a timestamp field in mongoDB document.
Example
Timestamp :2022-07-29T08:14:14.657+00:00
Solution 1:
I have tried to add in the query to mongo to load only last 15 mints.
But the problem is , for example my job component remains down for 1 hour.
When it come up again , on next job run it will load only last 15 mints data and we lost the 45 mints data..
Required Solution :
If the job run first time then it will extract data of all time and load to SQL.
when the job run next time (let say after 15 mints) then it will automatically assume that these are newly created and will load only new rows.
Update
Now I have write a complete article on this solution. https://medium.com/@raowaqasakram/fetch-latest-data-from-mongodb-talend-1f21ba7b98b5