I am looking to kick off a modelling/forecasting task with the aid of git. I want to set up a git architecture to facilitate this, but am having some issues.
Goal: At the end of the modelling task in region/subregion branchs (human revisions needed, human revision = commit), merge down to master to have all of the forecasts available for review with what version of the code and dataset it was run on. If revisions need to be made later on, a modeller should be able to branch out from when the exact forecast was completed and work on it with the correct (possibly older) version of the code.
Issue: The data and code version can change. Older model runs will likley not be compatible with older code/data (for example, in region 1, code version 1 and data version 2 may be used, but in region 2, code 4 and data 6), and at the end of the project, forecasts must be able to be reproduced.
My solution: It seems to be against the philosophy of git, but every time there is a dataset or code update, place it in master and append a version number to the file name. Have region/subregion branches and tag every forecast completion commit Then when the forecast is completed, merge down to master, and add another file that states what version the code and data was run on. If a revision needs to be made, find the tag of completion, and remodel with the proper version of the code, merge back in to region and then down to master. If a model needs to be reproduced, run it with the correct code/data (from the additional file created).
Is this the best way to go about using git to track this process, or is there a better/simpler way? Will this process work, or are there unintended issues that may arise because of it?