Where to put Dockerfiles for software used across multiple projects?

Question

This is a similar question to this Dockerfile versioning best practice but not the same.

I do keep the Dockerfile for a project with that project's source code. That makes sense and seems good practice.

However I also have a Dockerfile image for software such as the Jenkins master server that will change quite often as I start to implement CI/CD and tweak things like agents etc. I need to version control it for that reason. Jenkins Agents will also have Dockerfiles to build those respectively.

The Jenkins Dockerfiles don't belong with any project because many projects will use Jenkins. The only way forward I can think of is to have a separate repo for Jenkins but at the moment its only 1 file.

I may also have a Dockerfile for the database software, say Postgresql. Does that need its own repo again seeing as the database can be used by many apps.

So is it normal to have a repo for just one file? Will I run into any problems in the future if I go down that route?

You can either create a common repository and organize the different Dockerfiles under different directories or alternatively, make use of [Jenkins shared libraries](https://www.jenkins.io/doc/book/pipeline/shared-libraries/) and keep the Dockerfiles under different subdirectories inside the `resources` directory. — Dibakar Aditya, Aug 09 '23 at 18:40

lchristmann · Answer 1 · 2023-08-22T14:48:24.490

You're right.

Storing files that many of your projects have in common centrally in a different location absolutely makes sense. Even if it's just one file. As you've said, this way, when you make changes to it, you only have to do it once.

In your case, you could extract your Dockerfiles (Jenkins Master Server Dockerfile, Jenkins Agents Dockerfiles and the Database Server Dockerfile) all into a single Git repository - or split them between several ones, that's up to you.

If you version-control these files separately like this, you have their change history clearly and cleanly pulled out of everything going on in those other repositories and put in a separate space. That's good!

Yes, it's completely normal and reasonable to have a repository for just a single file, especially when that file holds a crucial configuration or definition, that is shared across multiple projects. It is indeed considered best practice doing this instead of having duplicate content.

And no, you won't run into any further problems in the future by doing this. Quite the opposite - you can now manage everything related to those extracted files much better (version history, access control, actions like building and publishing container images,...).

However, this might change how you deal with your Dockerfiles.

Potential changes to your workflow

If your original process involved publishing the container images to a Container Registry...

... nothing will change for you. As before, you

build the Docker images from your Dockerfile(s) and
publish them (privately) to the Container Registry (e.g. Docker Hub, AWS ECR or GitHub/GitLab Container Registry)

It's just that now you do that in the CI/CD pipeline of this newly created repository (or repositories).

If your original process was building the container images on demand...

... and you want to keep that process (instead of going with the previously explained way), you'll have to put in a little bit of effort.

Idea: You still need those Dockerfiles in the same places as before, but now they lay in a different Git repository. How do you get them there?

Option a: Git

Git does not support picking out specific files from a remote repository.
But, you can include a whole Git repository in another - which is absolutely sufficient, especially if you distribute the Dockerfiles in such a way into different repos, that only files that are "mostly needed together" are grouped together in one repo.

Declaratively: by using Git submodules

git submodule add https://github.com/user-name/repo-name

you'll have to execute this command once, it'll create a .gitmodules file, by which you then have this dependency declared, and those files will be automatically always included

Programmatically: by using git clone

git clone https://github.com/user-name/repo-name.git

you'll have to execute this command every time, when you want to include those other files, probably when building the project, in the CI/CD pipeline

This assumes your project is public (in your network). If you use something like GitHub and this project is private, you might to use a different URL, providing an access token, e.g. https://oauth2:<your-access-token>@your-domain.com/your-path/your-project.git

Option b: (Code Repository Platform's) CI/CD Tools

Another approach would be to use features of your code hosting platform (if you use one, like GitHub/GitLab/BitBucket/...).

Unfortunately, as of today, those platforms do not provide a built-in "shared files" funtionality. This behaviour instead can be obtained by using their integrated CI/CD solutions.

E.g. using job artifacts (part of CI/CD):

in the CI/CD pipeline of that new repository upload the Dockerfiles as artifacts
(in GitHub | in GitLab)
and then fetch them in the other repositories' CI/CD pipelines wherever you need those Dockerfiles
(in GitHub | in GitLab)

GitLab:

CI/CD pipeline of that new repository, .gitlab-ci.yml file:

# This job uploads 2 Dockerfiles
upload:
  script:
    - echo "Uploading Dockerfiles as Job Artifacts..."
  artifacts:
    paths:
      - folder-1/Dockerfile
      - folder-2/Dockerfile

CI/CD pipelines of other repositories, .gitlab-ci.yml file:

# Alternative A: This job downloads the whole artifacts archive
download-a:
  script:
    - echo "Downloading all Job Artifacts..."
    - 'curl --location --output artifacts.zip --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/<project-id>/jobs/artifacts/<branch-name>/download?job=upload"'
    - unzip artifacts.zip
    - ...

# Alternative B: This job downloads a single file from the artifacts archive
download-b:
  script:
    - echo "Downloading only the folder-1/Dockerfile Job Artifact..."
    - 'curl --location --output artifacts.zip --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/<project-id>/jobs/artifacts/<branch-name>/raw/folder-1/Dockerfile?job=upload"'
    - unzip artifacts.zip
    - ...

Note: GitLab always gives you your artifact(s) as ZIP file

Option c: (Exclusive) CI/CD Tools

If you projects use a CI/CD Tool, like Jenkins or Travis CI, you can use that to make files or repositories globally accessible to all of your projects during build time.

For Jenkins that would be Shared Libraries.

For Travis CI such a feature does not yet exist, so you'd have to fall back to the process shown in Option b: upload the files you want to share as artifacts and download them where needed.

Option d: Server

If you do not use such a conventional code hosting platform or CI/CD Tool and you also also don't want to use git for this case, you could take care of making those files accessible to your projects by yourself.

That means serving the file yourself, from your server. As static asset. Maybe you have a Apache Server, and your Dockerfiles laying in some (with or without authentication accessible) place.

Then you could simply fetch the desired files via HTTP / CURL requests to its' URLs: https://your-domain.com/your-path/your-file

Option e: Cloud / File-Sharing Service

Or, if you're using a Cloud Service Provider / a File-Sharing Service, you could use that to centrally store those files and downlaod them where needed.

E.g. Amazon S3, Google Drive, Dropbox,...

A CI/CD pipeline could do the job of getting the files there.