0

I'm making a Python program that visualizes Covid vaccination data on a world map using Our World In Data's vaccination data in .json format. I'd like to introduce a feature where the program downloads the latest .json file from OWID's Github and replaces the old one, provided that there is at least a 24 hour difference between the 'last modified' dates of the two files.

My question is whether I can instead harness Git/Github's power to quickly compare the contents of the local file and remote file and only download whatever's different between the two, in order to cut down on the size of whatever has to be downloaded. The end goal is using as little bandwidth/time for downloading a fresh version of the file as possible.

Konrad Gr
  • 1
  • 1

1 Answers1

2

You could do this quite cleanly by using Git directly: Keep a local clone of the repository, then run git pull to get the latest updates. A command like git fetch; git status could tell you beforehand whether you're out of sync or not. Git is optimized for pulling down updates to a file as efficiently as possible.

Alex Hurst
  • 184
  • 8
  • 2
    `git fetch` will download all updates - it's not any faster than `git pull` assuming you're primary bandwidth limited. See https://stackoverflow.com/questions/41741890/fetch-refs-without-downloading-objects for an example of how to fetch a ref without fetching objects. – Nick ODell Jul 12 '21 at 21:09