12

How can we determine that two Docker images have exactly the same file system structure, and that the content of corresponding files is the same, irrespective of file timestamps?

I tried the image IDs but they differ when building from the same Dockerfile and a clean local repository. I did this test by building one image, cleaning the local repository, then touching one of the files to change its modification date, then building the second image, and their image IDs do not match. I used Docker 17.06 (the latest version I believe).

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
mljrg
  • 4,430
  • 2
  • 36
  • 49

5 Answers5

13

If you want to compare content of images you can use docker inspect <imageName> command and you can look at section RootFS

docker inspect redis

    "RootFS": {
        "Type": "layers",
        "Layers": [
            "sha256:eda7136a91b7b4ba57aee64509b42bda59e630afcb2b63482d1b3341bf6e2bbb",
            "sha256:c4c228cb4e20c84a0e268dda4ba36eea3c3b1e34c239126b6ee63de430720635",
            "sha256:e7ec07c2297f9507eeaccc02b0148dae0a3a473adec4ab8ec1cbaacde62928d9",
            "sha256:38e87cc81b6bed0c57f650d88ed8939aa71140b289a183ae158f1fa8e0de3ca8",
            "sha256:d0f537e75fa6bdad0df5f844c7854dc8f6631ff292eb53dc41e897bc453c3f11",
            "sha256:28caa9731d5da4265bad76fc67e6be12dfb2f5598c95a0c0d284a9a2443932bc"
        ]
    }

if all layers are identical then images contains identical content

Bukharov Sergey
  • 9,767
  • 5
  • 39
  • 54
  • This does not work. I just touched one file between the builds of two images with the same name, and got different sha256. – mljrg Sep 20 '17 at 14:22
  • 5
    If you touched a file this leads to a modification which leads to a different hash, @mljrg – Marged Sep 21 '17 at 15:50
  • 4
    To get only RootFS as one string: `docker inspect --format='{{.RootFS}}' ` – Mugen Aug 14 '19 at 08:14
  • Or for a nicer output, `docker inspect | jq -r ".[].RootFS.Layers"`. For this you need to install [jq](https://stedolan.github.io/jq/). – Zheng Qu Dec 08 '22 at 08:02
7

After some research I came up with a solution which is fast and clean per my tests.

The overall solution is this:

  1. Create a container for your image via docker create ...
  2. Export its entire file system to a tar archive via docker export ...
  3. Pipe the archive directory names, symlink names, symlink contents, file names, and file contents, to an hash function (e.g., MD5)
  4. Compare the hashes of different images to verify if their contents are equal or not

And that's it.

Technically, this can be done as follows:

1) Create file md5docker, and give it execution rights, e.g., chmod +x md5docker:

#!/bin/sh
dir=$(dirname "$0")
docker create $1 | { read cid; docker export $cid | $dir/tarcat | md5; docker rm $cid > /dev/null; }

2) Create file tarcat, and give it execution rights, e.g., chmod +x tarcat:

#!/usr/bin/env python3
# coding=utf-8

if __name__ == '__main__':
    import sys
    import tarfile
    with tarfile.open(fileobj=sys.stdin.buffer, mode="r|*") as tar:
        for tarinfo in tar:
            if tarinfo.isfile():
                print(tarinfo.name, flush=True)
                with tar.extractfile(tarinfo) as file:
                    sys.stdout.buffer.write(file.read())
            elif tarinfo.isdir():
                print(tarinfo.name, flush=True)
            elif tarinfo.issym() or tarinfo.islnk():
                print(tarinfo.name, flush=True)
                print(tarinfo.linkname, flush=True)
            else:
                print("\33[0;31mIGNORING:\33[0m ", tarinfo.name, file=sys.stderr)

3) Now invoke ./md5docker <image>, where <image> is your image name or id, to compute an MD5 hash of the entire file system of your image.

To verify if two images have the same contents just check that their hashes are equal as computed in step 3).

Note that this solution only considers as content directory structure, regular file contents, and symlinks (soft and hard). If you need more just change the tarcat script by adding more elif clauses testing for the content you wish to include (see Python's tarfile, and look for methods TarInfo.isXXX() corresponding to the needed content).

The only limitation I see in this solution is its dependency on Python (I am using Python3, but it should be very easy to adapt to Python2). A better solution without any dependency, and probably faster (hey, this is already very fast), is to write the tarcat script in a language supporting static linking so that a standalone executable file was enough (i.e., one not requiring any external dependencies, but the sole OS). I leave this as a future exercise in C, Rust, OCaml, Haskell, you choose.

Note, if MD5 does not suit your needs, just replace md5 inside the first script with your hash utility.

Hope this helps anyone reading.

mljrg
  • 4,430
  • 2
  • 36
  • 49
6

Amazes me that docker doesn't do this sort of thing out of the box. Here's a variant on @mljrg's technique:

#!/bin/sh

docker create $1 | {
  read cid
  docker export $cid | tar Oxv 2>&1 | shasum -a 256
  docker rm $cid > /dev/null
}

It's shorter, doesn't need a python dependency or a second script at all, I'm sure there are downsides but it seems to work for me with the few tests I've done.

rich
  • 18,987
  • 11
  • 75
  • 101
3

There doesn't seem to be a standard way for doing this. The best way that I can think of is using the Docker multistage build feature. For example, here I am comparing the apline and debian images. In yourm case set the image names to the ones you want to compare

I basically copy all the file from each image into a git repository and commit after each copy.

FROM alpine as image1

FROM debian as image2

FROM ubuntu
RUN apt-get update && apt-get install -y git
RUN git config --global user.email "you@example.com" &&\
 git config --global user.name "Your Name"

RUN mkdir images
WORKDIR images
RUN git init

COPY --from=image1 / .
RUN git add . && git commit -m "image1"

COPY --from=image2 / .
RUN git add . && git commit -m "image2"

CMD tail > /dev/null

This will give you an image with a git repository that records the differences between the two images.

docker build -t compare .
docker run -it compare bash

Now if you do a git log you can see the logs and you can compare the two commits using git diff <commit1> <commit2>

Note: If the image building fails at the second commit, this means that the images are identical, since a git commit will fail if there are no changes to commit.

yamenk
  • 46,736
  • 10
  • 93
  • 87
  • This goes to comment section – Burhanuddin Rashid Sep 20 '17 at 12:47
  • Why did you removed the container-diff solution? Does it not work? – mljrg Sep 20 '17 at 13:57
  • @mljrg Its something I haven't tried before, not sure if it works. – yamenk Sep 20 '17 at 13:58
  • Have you tried your solution? I have not but from what I can see you will always end-up with the difference between the alpine and the debian differences ... – mljrg Sep 20 '17 at 14:30
  • @mljrg Yes I have tried it :) . In the same I will end up with differences because the images are different. In ur case change alpine and debian to the images you want to compare. I tried it with same image and no differences were detected (the second commit in this case fails because there are no differences). – yamenk Sep 20 '17 at 14:33
  • I just tested [container-diff](https://github.com/GoogleCloudPlatform/container-diff) and its `-f` flag does not detect file content changes, only directory changes. – mljrg Sep 20 '17 at 14:59
  • Your solution works. However, it writes a lot in the console, creates an auxiliary 293MB image (name ":"), does not clean used containers when it fails due to the two input images being equal, and is slow when the auxiliary image does not exist yet. – mljrg Sep 21 '17 at 15:06
1

If we rebuild the Dockerfile it is almost certainly going to produce a new hash.

The only way to create an image with the same hash is to use docker save and docker load. See https://docs.docker.com/engine/reference/commandline/save/

We could then use Bukharov Sergey's answer (i.e. docker inspect) to inspect the layers, looking at the section with key 'RootFS'.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Banoona
  • 1,470
  • 3
  • 18
  • 32