## Exercise 3: Creating your own docker image ##

### Objective ###

Learn how to create a docker image that you can use later

### Access to the tutorial material ###
You can create the dockerfiles in this exercise by cutting and pasting from the screen, or, if you've cloned the repository, you will find them already in the **tsi-cc/ResOps/scripts/docker/** subdirectory:

```
# Clone this documentation if you haven't already done so
> git clone https://gitlab.ebi.ac.uk/TSI/tsi-ccdoc.git
> cd tsi-ccdoc/tsi-cc/ResOps/scripts/docker/
```


### Creating an image, step by step ###
You can modify an image interactively, as we saw in the first exercise, but that's no sane way to build an image for re-use later. Much better is to use a **Dockerfile** to build it for you. Take a look at **Dockerfile.01**, which should have these contents:

```
#
# Comments start with an octothorpe, as you might expect
#
# Specify the 'base image'
FROM ubuntu:latest

#
# Naming the maintainer is good practice
LABEL Author="Your Name" Email="your@email.address"

#
# The 'LABEL' directive takes arbitrary key=value pairs
LABEL Description="This is my personal flavor of Ubuntu" Vendor="Your Name" Version="1.0"

#
# Now tell ubuntu to update itself
RUN apt-get update -y
```

You can have multiple **RUN** commands, though you should check out the **Best practices** for a comment about that.

You tell docker to build an image with that dockerfile by using the **docker build** command. We'll give it a **tag** with the `--tag` option, and we tell it which dockerfile to use with the `--file` option.

We also have to give it a _context_ to build from, so we give it the current directory `.`. If we add files, they will be taken relative to that context. The context can also be a URL, see [https://docs.docker.com/engine/reference/commandline/build/](https://docs.docker.com/engine/reference/commandline/build/) for full details.

```
# N.B. This assumes you have the USER environment variable set in your environment
> docker build --tag $USER:ubuntu --file Dockerfile.01 .
Sending build context to Docker daemon 72.19 kB
Step 1 : FROM ubuntu:latest
 ---> 4ca3a192ff2a
Step 2 : MAINTAINER Your Name "your@email.address"
 ---> Running in 051314cdc3ec
 ---> ea59cb99c816
Removing intermediate container 051314cdc3ec
Step 3 : LABEL Description "This is my personal flavor of Ubuntu" Vendor "Your Name" Version "1.0"
 ---> Running in 099b516c4bdf
 ---> 241e336f1ef1
Removing intermediate container 099b516c4bdf
Step 4 : RUN apt-get update -y
 ---> Running in 5ec72101d67b
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial/main Sources [1103 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial/restricted Sources [5179 B]
Get:6 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/main Sources [261 kB]
Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/restricted Sources [1872 B]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates/universe Sources [137 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [548 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [11.7 kB]
Get:15 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [459 kB]
Get:16 http://archive.ubuntu.com/ubuntu xenial-security/main Sources [60.7 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial-security/restricted Sources [1872 B]
Get:18 http://archive.ubuntu.com/ubuntu xenial-security/universe Sources [15.8 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages [225 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [11.7 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [76.9 kB]
Fetched 24.6 MB in 14s (1721 kB/s)
Reading package lists...
 ---> 312bd6b10add
Removing intermediate container 5ec72101d67b
Successfully built 312bd6b10add
```

Now, you can see your image with the **docker images** command:

```
> docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
wildish             ubuntu              312bd6b10add        About a minute ago   167.6 MB
ubuntu              latest              4ca3a192ff2a        25 hours ago         128.2 MB
```

Our new image is there, and it's about 40 MB bigger than the image we started from, because of the updates we applied.

We can now run that image and check that it really is updated by trying to apply the updates again, there should be nothing new to do:

```
> docker run -t -i $USER:ubuntu /bin/bash
root@4989d23e6e8b:/# apt-get update -y
Hit:1 http://archive.ubuntu.com/ubuntu xenial InRelease
Hit:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
Hit:3 http://archive.ubuntu.com/ubuntu xenial-security InRelease
Reading package lists... Done
root@4989d23e6e8b:/# exit
```

As expected, there's nothing new to apply.

### Inspecting an image to find out how it was built ###
A brief aside, if you want to find out how a container was built, you can use the **docker inspect** command. It gives full details as a JSON document, more than you'd normally want to know, but we can at least use it to get back the **MAINTAINER** and **LABELS** we added:

```
> docker inspect $USER:ubuntu | grep --after-context=6 Labels
            "Labels": {
                "Author": "Your Name",
                "Description": "This is my personal flavor of Ubuntu",
                "Email": "your@email.address",
                "Vendor": "Your Name",
                "Version": "1.0"
            }
--
            "Labels": {
                "Author": "Your Name",
                "Description": "This is my personal flavor of Ubuntu",
                "Email": "your@email.address",
                "Vendor": "Your Name",
                "Version": "1.0"
            }
```

Why do the **LABELS** we specified appear twice? I don't know...

### Adding our own programs to the image ###
There's a sample Perl script, **hello-user.pl** in your working directory. Please take the time to make sure you understand how it works before proceeding.

Let's tell docker to add that script to the image, so we can run it as an application.

We'll use **Dockerfile.02**, which has the following content:

```
FROM ubuntu:latest
LABEL Author="Your Name" Email="your@email.address"

RUN apt-get update -y

#
# Set an environment variable in the container
ENV MY_NAME Tony

#
# Add our perl script
ADD hello-user.pl /app/hello.pl
```

You can see that we've set an environment variable in our image (**MY_NAME**) and we've added our script as **/app/hello.pl**. You can have as many **ENV** and **ADD** sections as you like, though as with the **RUN** section, it's worth learning about the best practices before adding too many.

Now build the image:

```
> docker build --tag $USER:ubuntu --file Dockerfile.02 .
Sending build context to Docker daemon  95.74kB
Step 1/6 : FROM ubuntu:latest
 ---> 7698f282e524
Step 2/6 : LABEL Author="Your Name" Email="your@email.address"
 ---> Using cache
 ---> 4da140dc87fa
Step 3/6 : LABEL Description="This is my personal flavor of Ubuntu" Vendor="Your Name" Version="1.0"
 ---> Using cache
 ---> a6f0cc9d1234
Step 4/6 : RUN apt-get update -y
 ---> Using cache
 ---> 2f162cdbcc1e
Step 5/6 : ENV MY_NAME Tony
 ---> Running in b166b73c2eb0
Removing intermediate container b166b73c2eb0
 ---> 4d2ba043c256
Step 6/6 : ADD hello-user.pl /app/hello.pl
 ---> d83241a70a07
Successfully built d83241a70a07
Successfully tagged wildish:ubuntu
```

Note steps 1 through 4, where the cache was used to save time building the image. I.e. we didn't have to build the entire image from scratch, and apply the updates again.

We've re-used the tag (```$USER:ubuntu```), so this version will replace the old one. That's not a good idea if the image is already in use in production, of course!

Now let's run the app in the image

```
> docker run -t -i --rm $USER:ubuntu /app/hello.pl
Hello Tony
```

What happens if we update our script, will docker be smart enough to pick up the changes? Yes, up to a point.

Let's start by copying a new version of the script in place, and re-build the image:

```
> cp hello-user-with-args.pl hello-user.pl 
> docker build --tag $USER:ubuntu --file Dockerfile.02 .
Sending build context to Docker daemon 81.92 kB
Step 1 : FROM ubuntu:latest
 ---> 4ca3a192ff2a
Step 2 : MAINTAINER Your Name "your@email.address"
 ---> Using cache
 ---> ea59cb99c816
Step 3 : LABEL Description "This is my personal flavor of Ubuntu" Vendor "Your Name" Version "1.0"
 ---> Using cache
 ---> 241e336f1ef1
Step 4 : RUN apt-get update -y
 ---> Using cache
 ---> 312bd6b10add
Step 5 : ENV MY_NAME Tony
 ---> Using cache
 ---> 0857feeb7bb0
Step 6 : ADD hello-user.pl /app/hello.pl
 ---> ae442bdee840
Removing intermediate container 5fe5c7d58e0d
Successfully built ae442bdee840
```

Step 6 didn't use the cache, because docker noticed the script had been updated. However, if the script itself hadn't changed, but modules or libraries that it uses have changed, docker wouldn't be able to pick that up on its own. Put differently, the build process can't 'see through' commands like **apt-get update -y** to know that there are changes since it was last run.

In case you want to, you can force a re-build from the start by telling docker not to use the cache:

```
> docker build --no-cache --tag $USER:ubuntu --file Dockerfile.02 .
[...]
```

### Passing arguments to an application in an image ###
Can we change who it says hello to? Yes, we can! We can set environment variables in the container before the application runs by using the ```--env``` flag with **docker run**:

```
> docker run -t -i --rm --env MY_NAME=Whoever $USER:ubuntu /app/hello.pl
Hello Whoever
```

The new version uses the environment variable **MY_NAME** by default, as before, but also allows you to override that by giving command-line options. To do that, simply append the arguments to the end of the **docker run** command:

```
> docker run --rm -ti $USER:ubuntu /app/hello.pl someone
Hello someone
```

### Running an application by default ###
Finally, let's try getting our application to run by default, so we don't have to remember the path to it whenever we want to run it. **Dockerfile.03** shows how to do that

```
FROM ubuntu:latest
LABEL Author="Your Name" Email="your@email.address"

#
# The 'LABEL' directive takes arbitrary key=value pairs
LABEL Description="This is my personal flavor of Ubuntu" Vendor="Your Name" Version="1.0"

#
# Now tell ubuntu to update itself
RUN apt-get update -y

#
# Set an environment variable in the container
ENV MY_NAME Tony
ADD hello-user.pl /app/hello.pl

#
# Specify the command to run!
CMD /app/hello.pl
```

So, build it, then run it:

```
> docker build --tag $USER:ubuntu --file Dockerfile.03 .
[...]
> docker run --rm -ti $USER:ubuntu
Hello Tony
```

### Optimizing builds ###
We saw that `docker build --no-cache ...` solves the problem of docker not knowing if something was updated, but doing _everything_ from scratch can be a bit expensive. The obvious solution is to build intermediate images, and move the more stable stuff into the earlier images. Take a look at **Dockerfile.04.base** and **Dockerfile.04.app**, they're just **Dockerfile.03** split into two parts:

```
> cat Dockerfile.04.base 
FROM ubuntu:latest
LABEL Author="Your Name" Email="your@email.address"

RUN apt-get update -y

> cat Dockerfile.04.app 
FROM wildish:ubuntu

ENV MY_NAME Tony
ADD hello-user.pl /app/hello.pl

CMD /app/hello.pl
```

**Dockerfile.04.base** builds an updated ubuntu image, while **Dockerfile.04.app** uses _that_ image as its base. As long as we tag the base image as **$USER:ubuntu**, and refer to it correctly in the **FROM** statement for the app, the app will find it correctly. We can't use the environment variable in the **FROM** statement for the app, so we have to hard-code the user name there. Change it to your own user name before building the image.

Note also that **Dockerfile.04.app** doesn't have a **MAINTAINER** or **LABEL** section, which means it will inherit them from the base image.

Now we can build our app in two stages:

```
> docker build --tag $USER:ubuntu --file Dockerfile.04.base .
Sending build context to Docker daemon 86.53 kB
Step 1 : FROM ubuntu:latest
 ---> 4ca3a192ff2a
Step 2 : MAINTAINER Your Name "your@email.address"
 ---> Using cache
 ---> 223050aea37e
Step 3 : LABEL Description "This is my personal flavor of Ubuntu" Vendor "Your Name" Version "1.0"
 ---> Using cache
 ---> c03ba3b7afd5
Step 4 : RUN apt-get update -y
 ---> Using cache
 ---> 00269c0edb02
Successfully built 00269c0edb02

> docker build --tag $USER:hello --file Dockerfile.04.app .
Sending build context to Docker daemon 86.53 kB
Step 1 : FROM wildish:ubuntu
 ---> 00269c0edb02
Step 2 : ENV MY_NAME Tony
 ---> Using cache
 ---> 0fa5ba428fe0
Step 3 : ADD hello-user.pl /app/hello.pl
 ---> Using cache
 ---> 704d3c5941c6
Step 4 : CMD /app/hello.pl
 ---> Using cache
 ---> ce37fdc3bd4e
Successfully built ce37fdc3bd4e

> docker run --rm -ti $USER:hello
Hello Tony
```

If we force a rebuild of the app, it's very quick now, because it doesn't have to update the base ubuntu operating system.

Docker now supports a 'docker builder' pattern, which formalises this multi-step build approach. See [https://docs.docker.com/develop/develop-images/multistage-build/](https://docs.docker.com/develop/develop-images/multistage-build/) for more details, as well as exercise 5.

### Conclusion ###
You can now build your own images, starting from a base image, updating it, adding files, specifying environment variables, and specifying the default executable to run.

You know how to tag your images, so they have a meaningful name, and you know how to specify useful metadata that you can retrieve programatically.

### Best practices ###

- avoid building big images, start from the lightest base you can manage and only add what you really need
- move any stable, heavy parts of your build early in the Dockerfile, to maximize the benefit of the cache
- consider using intermediate builds, to further isolate stable parts from volatile parts if you need to force builds
- follow the official best-practices guide, at [https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/)