27

I know it is possible to fetch then use checkout with the path/to/file to download that specific file.

My issue is that I have a 1 MB data cap per day and git fetch will download all the data anyway even if it does not save them to disc until I use git checkout. I still used my data

Is my understanding of how git fetch/checkout correct? is there a way to download a specific file only to see if there is a new version before proceeding with the download.

  • 2
    Possible duplicate of [How to sparsely checkout only one single file from a git repository?](https://stackoverflow.com/questions/2466735/how-to-sparsely-checkout-only-one-single-file-from-a-git-repository) – Thomas Weller Jul 08 '19 at 22:44
  • @ThomasWeller it's a different issue. –  Jul 08 '19 at 23:05

5 Answers5

26

Gitlab has a rest API for that.

You can GET a file from repository with curl:

curl https://gitlab.com/api/v4/projects/:id/repository/files/:filename\?ref\=:ref

For example:

curl https://gitlab.com/api/v4/projects/12949323/repository/files/.gitignore\?ref\=master

If your repository isn't public you also need to provide an access token by adding --header 'Private-Token: <your_access_token>'.


Links:

You can check how to find repository api id here.

Api documentation

More on tokens

There is also a python library that uses this api.

Note that this is GitLab specific solution and won't work for other hostings.

Community
  • 1
  • 1
ja2142
  • 892
  • 15
  • 23
  • 3
    that's so helpful, however it returned a Json with the metadata of the file the Get raw file from repository GET /projects/:id/repository/files/:file_path/raw however it return the content of the file which is good, but without downloading the actual file. is there a way to download actual folder/files through the api –  Jul 09 '19 at 14:53
  • Json 'content' field is base64 of the file, so you have to decode it to access the original file. If you want to use the python library, this will probably be done for you. I'm not aware of a way to download directories, but you can do this by manually getting project tree like this: `https://gitlab.com/api/v4/projects/12949323/repository/tree?path=fonts/Lato` and donloading needed files individually. Again this would be easier with python library. You can find all of this in the api documentation. – ja2142 Jul 09 '19 at 15:36
19

To expand on the other answer, it is (now? I don't know when this feature was added) possible to just get a raw file instead of a json with a base64 encoding of the file.

From the documentation:

Endpoint: GET /projects/:id/repository/files/:file_path/raw

Example:

 curl --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/13083/repository/files/path%2Fto%2Ffile%2Efoo/raw?ref=master"

Note that in the example, the full path to the file is URL encoded leading to path/to/file.foo

Christian W
  • 1,396
  • 1
  • 12
  • 17
3

Using python-gitlab:

#!/usr/bin/python3
import gitlab
import sys


def download_file(host, token, project_name, branch_name, file_path, output):
    try:
        gl = gitlab.Gitlab(host, private_token=token)
        pl = gl.projects.list(search=project_name)
        for p in pl:
            if p.name == project_name:
                project = p
                break
        with open(output, 'wb') as f:
            project.files.raw(file_path=file_path, ref=branch_name, streamed=True, action=f.write)
    except Exception as e:
        print("Error:", e)


num_arguments = len(sys.argv)
if num_arguments < 6:
    print('Usage: ./download-gitlab-file.py host token project_name branch_name file_path output')
else:
    download_file(
        sys.argv[1],
        sys.argv[2],
        sys.argv[3],
        sys.argv[4],
        sys.argv[5],
        sys.argv[6]
    )

Gitlab example

Rubén Pozo
  • 1,035
  • 1
  • 12
  • 23
0

This works for me on a local gitlab:

curl http://mylocalgitlab/MYGROUP/-/raw/master/PATH/TO/FILE.EXT -o FILE.EXT

Yariv
  • 381
  • 3
  • 11
0

As of Git 2.0+ you can initialize a git repo without downloading the files. I used this format below to pull a few files from a submodule in my Gitlab CI/CD pipeline without using "git submodule update" so that I could save time/space when building docker images.

git -C path/to/submodule clone --depth 1 --no-checkout --filter=blob:none https://gitlab-ci-token:$GITLAB_PROJECT_TOKEN@gitlab.example.com/your/repo.git
git -C path/to/submodule checkout COMMIT_HASH/BRANCH_NAME -- FILE_NAME