I need to prepare job on linux that will download some files from two git repositories and make some operations on them. So let say that I want to download those files:
https://gitlab.com/repository_1/dir_1/file_11
https://gitlab.com/repository_1/dir_2/file_12
https://gitlab.com/repository_2/dir_1/file_21
I don't want to download whole repos (as they are big). I know that I may use wget, but maybe it is possible to do it with git (it may help in next step described below). I tried to use
git checkout origin/master -- https://gitlab.com/repository_2/dir_1/file_21
but I get error log:
fatal: Not a git repository (or any parent up to mount point /var/fpwork)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
After downloading those files won't be deleted so I also want to be sure, that in next run those files will be in latest version (after some time files on disk may be outdated). If they not I need to get updated files (delete and download again?). I may delete local files and download them again each time, but I think that there are better ways to have actual files.
I tried to use this command to get file's md5sum:
curl -sL https://gitlab.com/repository/dir/file | md5sum | cut -d ' ' -f 1
and compare it with already downloaded file, but each time I run this command I get different md5sum.
Some of those files are archives, so git diff
won't work.
Summarising:
- How to download single files from git repository?
- How to check if downloaded file is in latest version?
============================
Ad1. Downloading could be done with:
git archive --remote=ssh://gitlab.com/repository_1.git HEAD dir_1/file_11 | tar -x
Be aware that using https instead of ssh will cause fatal: Operation not supported by protocol.
error.
Ad2. this wget command works for me well:
wget -r -N -c https://gitlab.com/repository_1/dir_1/file_11