Exercise 5: Using the ‘builder’ pattern to build small images

Objective

Use the ‘builder’ pattern, otherwise known as multi-stage builds, to build even smaller docker images, with cleaner dockerfiles.

The Builder pattern

Optimising a build can be complex, it’s hard to keep your image from getting cluttered. E.g. installing a package might bring in a pile of documentation which you really don’t need. You can stay on top of that with chained RUN commands to remove the excess as it gets installed, but it gets messy eventually, and hard to follow.

Docker allows multi-step builds to get round this. The idea is that you first build a docker image that contains your software, then build a second image, copying across the installed software from the first one, taking across only the bits you need. You can leave behind all the junk that you don’t need, without having to jump through hoops to clean it out explicitly.

Take a look at Dockerfile.spades.builder in the tsi-cc/ResOps/scripts/docker directory. You should recognise the first sections, they’re identical to the optimised spades dockerfile from the previous exercise, with one exception. Instead of FROM debian:jessie-slim, the line now reads FROM debian:jessie-slim AS builder. This gives this particular image a name (‘builder’) which we can refer to later in the build process.

At the end of that first build, there are commands to build another image:

FROM debian:jessie-slim
COPY --from=builder /install /install
COPY --from=builder /root/.bashrc /root/.bashrc
ENV PATH /install/conda/bin:$PATH

Notice the COPY commands, which we use to copy from the image named ‘builder’ that we’ve just created! So we now have a debian:jessie-slim image with only the files we need to get spades running, and not much else.

In fact, we’ve done even better than that. Above that second FROM line, you’ll see some cleanup code:

RUN conda clean --all -y && \
    rm -rf /install/conda/pkgs/*/info && \
    find /install -name '*.a' -exec rm {} \;

The first two lines are conda-specific, don’t worry if you don’t know what they do, suffice it to say that they remove stuff that’s not strictly necessary for conda to run an application.

The third line removes all static libraries from the conda installation. You won’t need these anymore, so they’re useless.

Why do we only remove the static libraries from the conda installation, why not from the entire image? The reason is that the second image starts from debian:jessie-slim, and copies across only the stuff from the /install directory. So whatever cleanup we do in the first image outside that directory can have no effect in the second image.

Note also that if we didn’t build the second image, but only built the first image, with those same cleanup commands in the same order that they are now, they wouldn’t do anything useful. They won’t reduce the size of the first image because of the layered image structure. The first RUN command that installs all the software creates one cache layer, the second one that removes the junk creates another layer, without that stuff. But the first layer, with it all in, is still part of the image! If we wanted to concatenate the two RUN commands together, that would make the first image smaller, but would also make the whole thing less readable.

Finally, in this example, although we’ve use debian:jessie-slim for both the image stages, we’re not obliged to do that. Our ‘builder’ image can be something fat, with all the software we need to build our application already installed. E.g. it may have C and C++ compilers already in the image, to simplify the first steps. The second image can start from a much lighter base, even from a different distribution. As long as it has enough to run applications from the first image, that’s good enough.

So, go ahead and build the image with docker build -t spades:builder -f Dockerfile.spades.builder .. It will take less than two minutes, and the final image will be a measly 330 MB in size. That’s less than half the size of the previous optimised image, and is more than enough, there’s little point in optimising beyond this point.

If you were to run docker images now on a clean system, you’d see something like this:

> docker images 
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
spades              builder             194d8647f15d        3 seconds ago       274MB
<none>              <none>              d9552808c2f0        10 seconds ago      764MB
debian              jessie-slim         a0536420b3b7        8 days ago          81.4MB

You can see the base image, debian:jessie-slim, as well as the spades:builder image you just built. The other one, with no names for the repository or tag, is the first image built by the multi-stage build. You’ll see it’s the same size as the previously optimised spades image, because of course it’s the same docker code that built it.

The last exercise will show you how to clean up unwanted images like that from your system. It’s optional, you’ll need it eventually if you’re using your own laptop, but you may not need it on production systems.

Conclusion

Multi-stage builds can greatly reduce the size of your image, and make your dockerfile more readable. They’re also very easy to use. Of course you can combine multi-stage builds with building your own base image for ultimate control of your images.

Best practices

  • install your software as cleanly as possible, i.e. into a separate directory from the system software if possible
  • see if you can use the builder pattern to reduce the amount of junk in your image, your savings could be significant

Congratulations, you’re an expert! If you got this far, remember to update your CV to reflect your Docker skills.