0

We are having an application that reads a very large JSON structure and converts that to objects. That JSON structure is fetched data we are getting from 3rd party application. The process of fetching/retrieving that data is being done once a day and most of the time the data isn't changed. What we have:

  • We have a fetched data that is being stored once a day.
  • The size of that data is ~ 300MB JSON formatted file.
  • There are very small changes between every fetch being done, many times the data isn't changing at all, and won't change the size of the data dramatically.

We were thinking of using a similar approach as "git", so we are able to store only changes and have a unique identifier for every version, but we also need versioning so we can let different applications get the latest version or specific one from the fetched data.

Our app is written in python and is using MongoDB to store the data. The application is running locally (no cloud)

It's not a problem of comparing JSONs but to managed compared/diffs of large JSON files, and manage diffs versionings.

Any suggestions for implementation of that capabilities to our solution?

Mickey Hovel
  • 982
  • 1
  • 15
  • 31
  • 1. Is the changes/difference between versions computed by your application code? 2. If you need access to specific version of json data, it seems that you will need to store whole json data for that specific version anyway. Would it suffice for your case that you simply store the json data once your application identified changes/difference? – ray Jul 29 '21 at 05:56
  • 1. Yes, I have an app that is computing and fetching that data. 2. I wanted to be able to traverse the history of the changes of the JSON so I'm able to reproduce that data. – Mickey Hovel Jul 29 '21 at 05:58

0 Answers0