We are having an application that reads a very large JSON structure and converts that to objects. That JSON structure is fetched data we are getting from 3rd party application. The process of fetching/retrieving that data is being done once a day and most of the time the data isn't changed. What we have:
- We have a fetched data that is being stored once a day.
- The size of that data is ~ 300MB JSON formatted file.
- There are very small changes between every fetch being done, many times the data isn't changing at all, and won't change the size of the data dramatically.
We were thinking of using a similar approach as "git", so we are able to store only changes and have a unique identifier for every version, but we also need versioning so we can let different applications get the latest version or specific one from the fetched data.
Our app is written in python and is using MongoDB to store the data. The application is running locally (no cloud)
It's not a problem of comparing JSONs but to managed compared/diffs of large JSON files, and manage diffs versionings.
Any suggestions for implementation of that capabilities to our solution?