Save and re-use entire state of previous job in subsequent job in gitlab runner (cache everything for subsequent job)

Question

I am relatively new to gitlab and have figured out how to create and run a CI/CD pipeline which works quite well. Nonetheless, it runs relatively slow, as I currently repeatedly create the required environment in different jobs using before_script.

So what I want is to once install a bunch of packages and then re-use them in different jobs. I know that one would normally create a docker image for this and re-use it to run the CI. But here, I am interested in the possibility to re-use the state of different jobs in subsequent jobs.

Here's a minimal example that explains my problem and what I want to achieve:

stages:
  - prepare
  - stage1
  - stage2

image: python

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip

.prepare_step1:
  before_script:
    - pip install requests

prepare-env:
  image: python:3.7
  stage: prepare
  tags:
    - docker
  extends:
    - .prepare_step1
  script:
    - pip list

run-first-job:
  stage: stage1
  tags:
    - docker
  script:
    # I want to re-use the complete last state of the "prepare-env" stage here
    # i. e. use the installed requests package
    - pip list

run-second-job:
  stage: stage2
  tags:
    - docker
  script:
    # I want to re-use the complete last state of the "prepare-env" stage here
    # i. e. use the installed requests package
    - pip list

I know about artifacts and caching but I am not sure if these are made to transfer the entire state of the Docker container of the prepare-env job into the subsequent jobs run-first-job and run-second-job i.e. only install the packages once and then use them in other jobs.

Any hints are welcome, thanks in advance!

Could you explain further what is your understanding of "environment" and "state"? Are you referring to e.g. a stateful web application? If all you're doing in the `prepare` stage was installing some packages then I don't see why you wouldn't bake this into a custom Docker image. — slauth, Aug 20 '21 at 08:20
Thanks for your comment. I have specified my question concerning the state. As I said, I am looking for a solution WITHOUT creating a single docker image for this even if you are right that this would be a good solution here. But I am trying to do it without extra images. — Cord Kaldemeyer, Aug 20 '21 at 08:45
So if I understand you correctly, with "state" you are referring to the state of the Docker container of the `prepare-env` job? — slauth, Aug 20 '21 at 09:16
The challenge here IMHO is that jobs in your pipeline can execute on different runners, making it hard to reuse the container of a previous stage. The only possibility I see is to push the image from the `prepare` stage (e.g. to GitLab's Container Registry) which can then be pulled by the subsequent stages. — slauth, Aug 20 '21 at 10:26
This is actually an interesting idea. So basically `if EVENT: update image; else: use existing image` ? — Cord Kaldemeyer, Aug 20 '21 at 10:52
Actually my thinking was that in your `prepare` stage, instead of running commands inside the *container* you'd create a Docker *image*. You could do so with a [docker-in-docker approach](https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-the-docker-executor-with-the-docker-image-docker-in-docker). — slauth, Aug 20 '21 at 11:43
I'll have a look into that! Nonetheless, there might be other options. — Cord Kaldemeyer, Aug 20 '21 at 13:33
Take a look at an answer I posted on a similar question [here](https://stackoverflow.com/questions/65763137/sudo-command-not-found-gitlab-ci/65853032#comment120263520_65853032). The gist is that when you need to run updates on an image, or install additional software, the best practice for CI is generally to create your own image based on the original image, and install your required packages there, and use your custom image in your CI Pipeline. Examples and documentation links are in the linked answer above. — Adam Marshall, Aug 20 '21 at 17:05

score 0 · Answer 1 · answered Sep 07 '21 at 14:53

I actually found a solution by building a docker image only if specific files (here Dockerfile and setup.cfg) change:

stages:
  - prepare
  - run

image: python

variables:
  GIT_SUBMODULE_STRATEGY: recursive
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip

.build_docker_image_ci:
  before_script:
    - IMAGE_NAME=$CI_REGISTRY_IMAGE:"my-ci-image"
    - echo $IMAGE_NAME
    - git submodule update --recursive --remote
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_BUILD_TOKEN $CI_REGISTRY
    - DOCKER_BUILDKIT=1 docker build
      --build-arg PYTHON_REGISTRY_CONSTRING="$PYTHON_REGISTRY_CONSTRING"
      --pull -t $IMAGE_NAME .
    - docker push $IMAGE_NAME

build-docker-image-ci:
  image: python:3.7
  tags:
    - dockerbuilder
  stage: prepare
  extends:
    - .build_docker_image_ci
  rules:
    - changes:
        - Dockerfile
        - setup.cfg

run-job:
  stage: run
  image: $CI_REGISTRY_IMAGE:my-ci-image
  tags:
    - docker
  script:
    - pip list

Thanks for your help, Adam Marshall and slauth!

Save and re-use entire state of previous job in subsequent job in gitlab runner (cache everything for subsequent job)

1 Answers1