## Exercise 4: Optimising a docker image ## ### Objective ### Learn how to optimise a docker image ### Introduction ### There are a lot of best-practice guidelines out there, but they can be summarised in just two questions: - If I run the build in a year from now, on a clean machine, will I get the same result? What might change that would give me a different result? - What does each line in the Dockerfile do to the image, to the cache? As for image size, how big is too big? It really does depend on what you're doing, and anything less than 200 MB or so is excellent. Beyond 1 GB, you're probably carrying a lot of stuff you don't need, and performance may suffer for it. Software like R, Perl or Python (conda) tend to drag in a lot of packages depending on how they're installed, so watch out for them. Sometimes you can clean things up (e.g. **conda clean --all**), or you may need to install things manually, to keep control. Also, many providers of base OS images have slimmed-down versions that you can use if you don't want everything. E.g. There's a **debian:jessie** (129 MB) and a **debian:jessie-slim** (81 MB). ### Optimise a Dockerfile ### Take a look at **Dockerfile.spades.no-opt** in the **tsi-cc/ResOps/scripts/docker** directory. You can build a container from this with ```docker build -t spades:no-opt -f Dockerfile.spades.no-opt .```, but **don't do that!** This takes about 15 minutes to run, and the resulting container is about 6.3 GB in size. That's a huge image, and building it wastes a lot of your time! Following the guidelines in this tutorial, you can get it down to 2 minutes and 760 MB or less. You should spend a few minutes thinking about how to do that yourself, using what you've already learned. Give it a go on your own, then compare your results with the steps shown here. ### Are you installing stuff you don't need? ### The docker file installs the Gnome graphical desktop, do you think you're likely to need that in a container? Probably not, so go ahead and remove it. Also, since spades itself is being installed by **conda**, you probably don't need to install GCC - you're not compiling spades from source so why do you need a compiler? Remove that too. That gets the container down to 3.1 GB, half the size already. Give that a go, and see if you can build the image and check its size. Tag the image with **spades:opt**, instead of **spades:no-opt**, so you can directly compare them later. ### The centos image is rather large ### How about starting from a smaller image, say **debian:jessie-slim**? You'll need to replace the **yum** commands with **apt-get**, since CentOS and Debian use different package managers, but apart from that, that's all you'll need to do there. That gets the image down to 2.6 GB, another 500 MB saved. ### There's still more stuff you don't need in there ### You may not know the **conda** installer (you should, but that's another story). Even so, you can see that it's being used to install a lot of things other than spades. Since we only want spades here, we don't need to install **metabat2**, **bwa**, **samtools** and **blast**. Remove them from the appropriate line of the Dockerfile. Likewise, you can remove the lines that install **ncurses** and **openssl**, since we don't need to install them explicitly. If spades needs them, it will pull them in via its dependencies. If you rebuild the image now you'll see it's only a bit over 900 MB in size, great progress! ### Now to make better use of the cache ### The **RUN** commands can be concatenated into a single RUN command. That alone will save about 200 MB more, so you now have an image that takes only 760 MB and less than 2 minutes to build! Compare your Dockerfile with **Dockerfile.spades.opt** in the same directory and see how it looks. You can check your image works with **docker run -it --rm spades:opt spades.py --test** ### Conclusion ### You now know several simple ways to optimise the size of a docker image: - pick the smallest base image you can easily use - only install what you need - pay attention to the cache and the number of layers you're using and from the previous exercise, you know you can optimise build-times by building your own base image for all your projects. However, there's one more technique worth knowing about, and that's the subject of the next exercise. ### Best Practices ### - check the size of the image you build, think if it's worth spending time making it smaller. - check the base image you use, and the way you install software. Will this be reproducible in a year from now?