Harness the power of Docker multi-stage builds

2023-08-15

Making your Docker^® images efficient to build and small to deploy is not the easiest task. But Docker software multi-stage build feature and the BuildKit build engine can help you achieve smaller production images and shorter build times in your CI/CD pipeline. It can also be helpful during development since you can use build stages to run developer tools like linters and unit tests.

If you are unfamiliar with the concept of multi-stage builds there are a lot of resources that can help you understand the basics. But in short, it is having multiple images declared in the same Dockerfile and the ability to have them depend on each other. In this article, I will focus on use cases that are not commonly described but might help you in your daily work.

The multi-stage build feature was added in Docker Engine v17.05 and BuildKit in v18.09.

Use Cases

Using caches effectively

This isn’t specifically connected to multi-stage builds but is an important feature of Buildkit to leverage to get the most out of your Docker usage. This use case is documented in the official docs. Use this pattern to shorten the time it takes to fetch packages or compile resources in a shared CI environment. Please note that the mount is by default shared between all builds on the same Docker daemon. If you have a use case where you can’t share resources you will need to set the sharing parameter accordingly. In the example below we will populate the pip cache directory when we build the image, and if a package is already downloaded into the cache pip will use the cached package. Since all packages are versioned we can reuse the same pip cache for all builds on a specific CI server making builds faster.

# syntax=docker/dockerfile:1.4
 
FROM python:3
WORKDIR /src
COPY requirements.txt ./
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

Build / Run-time separation

The most common use case for multi-stage builds is to use one image to build your software and then copy it into another run-time optimized image. Below is an example of an extremely simple Go application image.

Splitting the build from the run-time of the binary helps us to optimize the build image for speed instead of size so that we spend as little time as possible in the CI pipeline. It also allows us to mount secrets or configurations that we don’t want to include in the application image.

In the second stage, we build the run-time image with only the dependencies it needs, keeping the production image as small and secure as possible. In this case, we use the scratch image since we have a static binary that doesn’t have any dependencies.

# syntax=docker/dockerfile:1.4
 
FROM golang:1.19 as build
WORKDIR /opt/src/
COPY src/ .
RUN --mount=type=cache,target=/go/pkg/mod \
    go mod download
RUN --mount=type=cache,target=/root/.cache/go-build,shared=private \
    go build -o /my-app
 
FROM scratch
COPY --from:build /my-app /usr/local/bin/my-app
# set user to UID of the nobody for better security
USER 65534
ENTRYPOINT ["/usr/local/bin/my-app"]
CMD ["--help"]

Build stages in parallel

Sometimes you need to build an image that should contain resources that you fetch from an external source. Downloading these files in one stage and copying them in a second stage allows us to build stages in parallel cutting down on the build time. The example below builds a Jenkins controller image while downloading plugins in parallel. This pattern can be used for any use case where you have independent tasks that are needed for the final image. Buildkit will build all stages in parallel up until the layer where a dependent stage is referenced.

# syntax=docker/dockerfile:1.4
 
FROM python:3 as plugins
...
# Download plugins to /plugins
RUN jenkins-plugin-downloader --src plugins.txt --cleanup --dest /plugins
 
FROM openjdk:8-jdk as jenkins
...
# If the download above hasn't completed, buildkit will wait here until the plugins stage is built
COPY --from=plugins /plugins/*.jpi /var/lib/jenkins/plugins/

Build dependency images

There are times when we might not have access to external resources and at those times it could be good to have an image with all the current build dependencies. There is some overhead to this pattern since we are building an intermediate image and pushing it to our local registry. But it will keep your CI pipeline going even if the external sources you depend on are unavailable for any reason. This pattern will also speed up CI builds since it is similar to using the build cache.

Given the following Dockerfile, we can build and push the build-base stage and then use it as the source in the build stage.

# syntax=docker/dockerfile:1.4
 
ARG build_base_image=build-base
  
# Install build dependencies
FROM debian:bullseye as build-base
RUN apt-get update && \
    apt-get install -y \
        build-essential
 
FROM ${build_base_image} as build
...

Let your CI pipeline build and push the build-base to your local Docker registry regularly. Or if you keep your dependencies listed in a file you can rebuild on updates to that file.

$ docker buildx build --target build-base \
    --tag registry.local/myproject/build-base:latest .
$ docker push registry.local/myproject/build-base:latest

Then you should be able to run your build pipeline using the pre-built image as the source. The build stage will now be built using the pre-built image instead of building and using the build-base stage.

$ docker buildx build --target build \
    --build-arg build_base_image=registry.local/myproject/build-base:latest .

Run development tools

We can use multi-stage builds to run development tools on any workstation without forcing the developer to install the tools locally. In the example below we want to run some Python linters and run our unit tests with the current code as build context. This pattern can be used with almost any kind of CLI tooling you have in your pipeline.

The test-base stage will contain the tools we need to run our linters with, installed using Poetry. The test stage then runs the tools mounting the build context to the workdir.

# syntax=docker/dockerfile:1.4
 
# Base image with Poetry
FROM python:3.10 as test-base
RUN pip install poetry
WORKDIR /test
COPY poetry.lock pyproject.toml ./
RUN poetry install
 
# Test image
FROM base as test
# The following commands mounts the build context into /test
RUN --mount=type=bind,target=/test poetry run pylint *.py
# If the tool needs to write to the mount, add `rw`. Everything written is discarded.
RUN --mount=type=bind,target=/test,rw poetry run pytest .

Putting the dependency installation in the base image makes sure that we use the build cache when we re-run the tests after changing the code. Below we can see that the cache is used for the test-base stage and only the linters and unit tests are rerun. Every time the build context changes these layers will be invalidated and rebuilt. This makes sure that they are run every time the code changes. The test-base stage will only be invalidated if we update the poetry.lock or pyproject.toml files which would make sure that we install dependencies according to the updated files.

# `--target test` lets Docker know which stage we want to build
$ docker buildx build --target test .
[+] Building 12.7s (19/19) FINISHED                                                             
...
 => [base 1/5] FROM docker.io/library/python:3.10@sha256:5bbf8c1d6f7c0946e405587c502f31623
 => CACHED [test-base 2/5] RUN pip install poetry
 => CACHED [test-base 3/5] WORKDIR /test
 => CACHED [test-base 4/5] COPY poetry.lock pyproject.toml ./
 => CACHED [test-base 5/5] RUN poetry install
 => [test 1/2] RUN --mount=type=bind,target=/test poetry run pylint *.py
 => [test 2/2] RUN --mount=type=bind,target=/test,rw poetry run pytest .
 => exporting to image
 => => exporting layers
 => => writing image sha256:e20247b9006daf456bd318959d0b89f73beeb3eff30d81691109e59d8acd5b

Summary

Multi-stage builds can be used in many different ways as we just saw. What I think is the best feature of multi-stage builds and the BuildKit engine is that you can focus on what makes a specific image efficient.

Instead of creating complex RUN statements just to make sure you clean up after building you can focus on creating a slim production image. Don’t let developers spend time installing the tools they need to run unit tests, just supply them with a Dockerfile and the command to build it. Why spend more time than needed in the CI pipeline when you can build stages in parallel?

I hope that this short list of examples can get you on your way to using multi-stage builds and inspire you to find more use cases for this awesome feature.

Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms used herein.

Author

Stefan Gangefors

View all posts