Exercise 7: Using metadata¶

Objective¶

Learn how to pass information (metadata) from your build environment through to your running containers.

This is not often needed, but can be handy if you want to make professional-grade images with proper documentation built in. If you define an ontology for your labels and environment variables you can even use it for automated workflows which select the correct image to run based on their properties.

The four docker environments¶

There are four different environments that interact to create a running container

the development environment, where the developer creates the dockerfile and builds the docker image
the docker build environment, where the docker daemon executes the build on behalf of the developer
the docker image, containing the baked-in software that will run when the container starts
the docker runtime environment, which can be different from one invocation of an image to the next

You already know that you can create more than one image from a docker container by giving it different tags (docker build -t ...), but what if you want to create two images from the same base code but with different settings. For example, maybe you’re building an annotation application that has experimental features that need to be compiled into the binary, and you want to build a DEV and a PROD version of the same application, with and without that feature.

One way to do this is to have two different dockerfiles, which differ only by the value you set for a flag or an environment variable. This is inconvenient if the dockerfiles are still under development, any changes you make to one must be propagated to the other and could easily be forgotten. What you need is a way to run the build twice, pass different settings in on the command line, and have them used in the container. You can do that with the ARG statment in your dockerfile.

Using ARG to control your image builds¶

Take the following dockerfile, which is a reduced version of Dockerfile.metadata, from the tsi-cc/ResOps/scripts/docker directory of the tutorial repository. It defines three ARG variables, one is APP_VERSION, which we set to either PRODUCTION or DEVELOPMENT, by hand. The other two are BRANCH, which we will set to the git branch we are working on, and COMMIT_ID, which we will set to the git commit hash. Normally it’s enough to use git tags to identify a version, but this gives us more precision, in case someone moves a tag this will still be correct.

> cat Dockerfile.metadata
FROM alpine:3.5

ARG APP_VERSION=undefined
ARG GIT_BRANCH
ARG GIT_COMMIT_ID

LABEL uk.ac.ebi.BRANCH=$GIT_BRANCH \
      uk.ac.ebi.COMMIT_ID=$GIT_COMMIT_ID \
      uk.ac.ebi.APP_VERSION=$APP_VERSION

ENV VERSION=$APP_VERSION \
    BRANCH=$GIT_BRANCH \
    COMMIT_ID=$GIT_COMMIT_ID

A few things to note about this dockerfile:

ARG lines cannot be chained in the same way that LABEL and ENV lines can. Why? I don’t know…
The recommended naming convention for LABELs like this is to use ‘reverse DNS’ notation. Java programmers will be familiar with this, it’s just a scheme for making sure that labels are in separate namespaces. E.g, if the base image, alpine:3.5, had a VERSION label set in it, this will make sure yours doesn’t collide with theirs.
- why isn’t this recommended for ENV variables too? For one thing, the period (’.’) isn’t valid in environment variable names. For another, people probably simply wouldn’t use such complicated names in their applications.

So how do we set an ARG? With the –build-args argument to docker build. For example:

> app_version=DEVELOPMENT

# Use some git magic to get the branch name and commit-id
> git_branch=`git branch | egrep '^\*' | awk '{ print $NF }'`
> git_commit_id=`git rev-parse HEAD`

> docker build \
   --build-arg APP_VERSION=$app_version \
   --build-arg GIT_BRANCH=$git_branch \
   --build-arg GIT_COMMIT_ID=$git_commit_id \
   -t metadata:$app_version \
   -f Dockerfile.metadata .

Note how we use the value of $app_version both in the –build-args and in the tag that we apply to the image.

You can now inspect the image to see those labels:

> docker inspect metadata:DEVELOPMENT | egrep uk.ac.ebi | head -3
                "uk.ac.ebi.APP_VERSION": "DEVELOPMENT",
                "uk.ac.ebi.BRANCH": "latest",
                "uk.ac.ebi.COMMIT_ID": "97eba80a9af544bdaba72d44f5e59f72e506c8d4"

and you can run the image to see the values, at runtime:

> docker run -it --rm metadata:DEVELOPMENT env | egrep 'VERSION|BRANCH|COMMIT_ID'
BRANCH=latest
COMMIT_ID=97eba80a9af544bdaba72d44f5e59f72e506c8d4
VERSION=DEVELOPMENT

To build a production version, simply define app_version=PRODUCTION and build the image again. Running docker images will then show you both images:

> app_version=PRODUCTION

> docker build \
   --build-arg APP_VERSION=$app_version \
   --build-arg GIT_BRANCH=$git_branch \
   --build-arg GIT_COMMIT_ID=$git_commit_id \
   -t metadata:$app_version \
   -f Dockerfile.metadata .

> docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
metadata            PRODUCTION          73acfce2eac1        5 seconds ago       4MB
metadata            DEVELOPMENT         70f5c17bc2d8        12 minutes ago      4MB
alpine              3.5                 f80194ae2e0c        6 months ago        4MB

Conclusion¶

You can pass information from the development environment to the build environment with –build-args, which allows you to build multiple variants of an image from the same dockerfile. This information can then be baked into the image for inspection, or for use when the image is run in a container.

Best practices¶

using ARGs is useful if you want to build multiple images from the same dockerfile, to avoid copy/paste errors
choose your LABEL and ENV names carefully, make sure you don’t overwrite anything in the base image that might cause confusion for users
automated build environments (such as gitlab) often make a bunch of useful bit of information available during the build. You can incorporate that information into the image as a way of documenting it automatically.