Tips and tricks with Docker

Adding $USER to docker group

If the current user can’t access the docker engine, because you’re lacking permissions to access the unix socket to communicate with the engine. The error “Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.37/info: dial unix /var/run/docker.sock: connect: permission denied” would occur. The easiest fix is to add the user to docker group and log back in.

sudo usermod -a -G docker $USER

Mounting a volume from host

With Docker bind mount, a volume or a file system can be made available to a container when started. The source needs to be the file system: local or mounted remotely from another host. The target can be any arbitrary path, which does not exist in the Docker image. If an existing path is used, the behaviour is undefined.

docker run -it --mount type=bind,source=/mnt/nfs,target=/app/pvol nginx

Size of a Docker image

There is no absolute rule what the best size is. In general, the performance of a container at both build time and runtime is the inverse of its size. Images over 1 GB should be looked at at present.

More importantly, it is the good old trade off of “cohesion vs. coupling” to be concerned about. It is the choice of architecture of “layered vs. microservice” to be debated. It is the principle of separating data from computing, user data from prod data, transient data from persistant data to be abid by. Architects in the Cloud Consulting Team could provide suggestions and second opinions.

In practice, always rebuild an image from scratch instead of an existing image. Always start with an empty directory with Dockerfile only. Always pick a minimal base image for software dependencies. Use .dockerignore to get rid of unwanted files and directories.

Running processes in a container as nonroot

By default, root user is assumed inside a container when it is started even by a nonroot user. This is because the Docker image was built as root by default. This is the norm in Docker. This security exposure has long be criticized.

A container can be run as any arbitrary user, for example as the current user with --user $(id -u):$(id -g).

[centos@tsi1539957622607-k8s-master ~]$ docker run -it --user $(id -u):$(id -g) --mount type=bind,source=/mnt/nfs,target=/app/pvol nginx
I have no name!@66b482bb3e63:/$ id
uid=1000 gid=1000 groups=1000
I have no name!@66b482bb3e63:/$  exit
exit
[centos@tsi1539957622607-k8s-master ~]$ id
uid=1000(centos) gid=1000(centos) groups=1000(centos),4(adm),10(wheel),190(systemd-journal),993(docker) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[centos@tsi1539957622607-k8s-master ~]$ 

I have no name!? This is just an indication that it is an useless attempt to try running a container with a user it knows nothing about. All the files and directories in the ontainer are still owned by root:root. There is no /home/$USER. There is no sudo. su - does not work, either.

The only option to is to create a nonroot user and changes the default to that one at the build time. For example,

FROM openjdk:8-jdk
RUN useradd --create-home -s /bin/bash nonrootuser
WORKDIR /home/nonrootuser
USER nonrootuser

Now, you are stuck with the arbitrary non-root user, which host many have no idea about. Without sudo command, You have have to built the user into root group and depend on su - to accommoplish many tasks. A lot of complications arises from there. root:root is really “the lesser of two evils”.

Containers should be immutable & ephemeral

Docker containers serve dual purposes for both Dev and Ops:

  1. Delivering software package for a very specific purpose.
  2. Providing a controlled runtime environment for a specific function.

Many people tend to treat Docker containers as a “hamster cages”. They include not only the software (a.k.a. the hamster) but also everything the hamster needs and produces (i.e. input, output and waste). This makes Docker containers not only larger than necessary but also mutable and irreplaceable. This makes such containers unmanageable. It can not be upgraded, replaced or even relocated.

Containers should not be self-sufficient units. They should be stateless and ephemeral. In other words, containers should be designed to address their Ops aspects first and foremost - controlled runtime environments. A good rule to apply is that all containers should be read-only. Logs should be mapped to external storage. Temporary files should be on temporary file systems. Input should always be taken from external sources. Containers should never be self-sufficient. They always need to be managed by additional software such as Kubernetes, and interconnected with other containers, external storage, message queues, etc..

Overall, Docker containers do not bring in simplicity unless they are used in extremely simple situations with quick and dirty solutions (picturing yourself hamster cages with smells and droppings). It is quite opposite that Docker containers exposes complexities and management overhead for complex applications.

Paying attention to build context

The current working directory, where docker build is issued, is called build context. Everything (i.e. every file under the current directory and all sub-directories) under the build context is sent to Docker daemon, and ended up in the image built. If the build command is issued under root directory, the final image is as large as the entire file system. If the build command is issued under different directories every time, the size and content of the images are different in each build.

To avoid such casual mistake, always create a build script to drive docker build. Always create and empty directory, and issue docker build from there.

For any real project, a CI/CD toolchain should be created to make sure that an image is built consistently and all the changes are traceable. See https://tsi-ccdoc.readthedocs.io/en/master/Tech-tips/DevOps-toolchain-docker.html for details how to create a toolchain.

In particular, the YAML file to invoke docker build should look like the following:

  script:
    - git clone ${CI_GIT_URL}
    - cd ${CI_GIT_REPO_NAME}
    - docker build -f ${CI_DOCKER_FILE} -t ${CI_SOURCE_IMAGE} . | tee ${ARTIFACT_DIR}/build.log
    - docker tag ${CI_SOURCE_IMAGE} ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG}
    - docker push ${CI_REGISTRY_IMAGE}:${CI_COMMIT_REF_SLUG} | tee ${ARTIFACT_DIR}/push.log
  1. Create and update Dockerfile in an IDE such as IntelliJ.
  2. Push build changes into Github or Gitlab.
  3. Create and update CI/CD script .gitlab-ci.yml. Make sure that the good practices are coded in the script.
    1. Create a fresh copy of build repository to a new directory via git clone.
    2. Change directory to the new directory for build repository.
    3. Issue docker build from the root of the build repository. This ensures that everything in the build repository is included in the image, and nothing else.
    4. Tag an image before push it to Docker Hub.
    5. Log docker build and docker push for detailed analysis later.
    6. If something in the build repository should not be included into the image, use .dockerignore to keep the size of the image even smaller.

Best practices for Dockerfile

Docker has published a document with extensive hints and tips how to write a good Dockerfile. Here is a link: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/.

Recommendations for the packaging and containerizing of bioinformatics software

https://f1000research.com/articles/7-742