Here is the situation. Ad-hock analytic repository with a directory per each individual analysis. Each directory contains a script(s) connected with one or more data files that come in different formats and are of different (sometimes considerable) size. Scripts without data are generally useless so we would like to store data files. On the other hand sometimes it's useful to look at the script without being forced to download associated data files(s) (to determine how some analysis were conducted).
We definitely don't want to store data on a separated repository (runtime issues, associating scripts with data files etc.)
What was analyzed:
- git submodules - separated repo, everything will be kept away from the scripts (not in same directories so it'd get messy over time)
- git hooks - intended rather for applying constraints or additional actions for push request and as was stated above - everyone should be able to upload any file (besides: we don't have access to apply sever side hooks)
The idea that comes to me is that it would be convenient to exclude some locations or certain files (i.e. >> 50 MB) from being pulled or cloned from repository. Just not to transfer unwanted data. Is it possible?
If some files are not touched over subsequent commits they are not necessary from the perspective of future pushes. Probably (or even for sure) I'm lacking certain knowledge about underlying mechanisms of git. I would be thankful for clarification.