4

I am trying to use cache in my .gitlab-ci.yml file, but the time only increases (testing by adding blank lines). I want to cache python packages I install with pip. Here is the stage where I install and use these packages (other stages uses Docker):

image: python:3.8-slim-buster

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip

stages:
  - lint
  - test
  - build
  - deploy

test-job:
  stage: test
  before_script:
    - apt-get update
    - apt-get install -y --no-install-recommends gcc
    - apt install -y default-libmysqlclient-dev
    - pip3 install -r requirements.txt
  script:
    - pytest tests/test.py

After running this pipeline, with each pipeline, the pipeline time just increases. I was following these steps from GitLab documentation - https://docs.gitlab.com/ee/ci/caching/#cache-python-dependencies Although I am not using venv since it works without it. I am still not sure why the PIP_CACHE_DIR variable is needed if it is not used, but I followed the documentation.

What is the correct way to cache python dependencies? I would prefer not to use venv.

Dave
  • 373
  • 1
  • 6
  • 17
  • are you using the cache only for this job or are there other jobs which will use the cache. – Origin Apr 05 '22 at 07:17
  • @Origin Hi, only this job will use the pip cache, buid and deploy jobs use Docker. I want to learn the best practice how to use cache with python dependencies without using venv – Dave Apr 05 '22 at 07:49

2 Answers2

5

PIP_CACHE_DIR is a pip feature that can be used to set the cache dir.

The second answer to this question explains it.

There may be some disagreement on this, but I think that for something like pip packages or node modules, it is quicker to download them fresh for each pipeline.

When the packages are cached by Gitlab by using

cache:
  paths:
    - .cache/pip

The cache that Gitlab creates gets zipped and stored somewhere(where it gets stored depends on runner config). This requires zipping and uploading the cache. Then when another pipeline gets created, the cache needs to be downloaded and unpacked. If using a cache is slowing down job execution, then it might make sense to just remove the cache.

Benjamin
  • 526
  • 6
  • 16
  • Hi, thank you for your answer! Before accepting the answer, may I ask if the example I have provided is correct to store cache (even though it does not make it faster). I am wondering if the example I have provided is even correct and if it will cache dependencies between pipelines – Dave Apr 05 '22 at 15:55
  • 1
    Yes your example is correct., but to verify for yourself open the job log view in the gitlab UI. Somewhere in the first 10-20 lines there should be output that says something like `downloading cache xyz...`. There should also be log output near the end of the log that says something like `.cache/pip found, adding to cache`. The message at the end of the job log will tell you if it can't find the directory you specified as a cache path. Note: Those messages aren't exact, but they should be in green text(unless not found) and are at the very end/beginning of the log. – Benjamin Apr 05 '22 at 17:45
0

Also: Gitlab documentation describes that cache should be set on the job; it cannot be set globally for the pipeline. This may cause your configuration to not work.

dhr_p
  • 2,364
  • 1
  • 22
  • 19