Optimizing Docker Image Sizes

Published

No matter how much we fight it, we’ll all end up working with Docker at some point. Below are a few tips to help optimize the size of your Docker images.

A view of running containers via Portainer – the top container is discussed in this post

Sample Dockerfiles uploaded to GitHub

Two Dockerfiles have been posted to GitHub, and are referenced through-out this article. The repository includes a pre-optimization Dockerfile and an optimized Dockerfile that follows the techniques discussed below.

1. Choose a “slim” base image that includes dependencies

When choosing a base image to to start with, opt for one that has been optimized for size. Many popular Docker images offer a slim or cloud version. In general, alpine base images are a great place to start whenever a Linux image is needed.

If certain runtime environments or other dependencies are required for your project, choose an image that already includes these by default. For example, the sample project above requires the Java Development Kit (JDK), and both of the Dockerfiles start from an image that already includes the JDK: openjdk:11-jdk-slim.

2. Reduce the amount of extra files layers produced

When composing Docker images, every instruction will create a new layer that inflates the size of the resulting image. Official documentation suggests to minimize the number of layers (and instruction steps).

One way to reduce instructions is the use && to join multiple commands into a single statement. In addition, performing any available clean-up or removal of temporary files in the statement will reduce its footprint.

For example, it is common to install all apt dependencies at the same time, and then clean up the apt cache once this is complete. In the sample project, we can see this performed on lines 4-7 in the code snippet below:

FROM openjdk:11-jdk-slim AS builder

# install maven and git
RUN apt-get update && apt-get install -y \
    maven \
    git \
 && apt-get clean && rm -rf /var/lib/apt/lists/*

# continue with build
RUN mkdir -p /workspace

# remaining instructions/steps removed for brevity

3. Perform multi-stage builds when possible

Multi-stage builds can provide a huge reduction in image size in situations where code is compiled or built from source. Conceptually, the code and dependencies are built in one image, and then the compiled binaries are copied into a separate image, which becomes the final image. This final image contains only the finished product (e.g. binary, executable, or library), without all of the dependencies and build environment.

Consider the pre-optimization Dockerfile from the sample project:

FROM openjdk:11-jdk-slim

# install maven and git
RUN apt-get update && apt-get install -y \
    maven \
    git

# clone loader repo
RUN mkdir -p /workspace
WORKDIR /workspace
# checkout specific branch for demonstration purposes
RUN git clone -b neo4j-3.5.11 https://github.com/tcallsen/osm2graph-neo4j.git
WORKDIR /workspace/osm2graph-neo4j

# build loader repo
RUN mvn clean install

# execute loader
CMD ["mvn", "exec:java", "-DosmFile=/workspace/in.osm", "-DgraphDb=/workspace/out/graph.db"]

The mvn clean install command on line 16 compiles the Java project, and downloads each dependency required for compilation (which inflates the Docker image size). However, these dependencies are not needed to actually execute the project.

With the multi-stage build shown below, the mvn clean install is executed in the first phase (labeled as builder), and the compiled executable (a Java .jar file in this case) is copied into the second phase of the build on line 29:

### Build in first stage

FROM openjdk:11-jdk-slim AS builder

# install maven and git
RUN apt-get update && apt-get install -y \
    maven \
    git \
 && apt-get clean && rm -rf /var/lib/apt/lists/*

# clone repo
RUN mkdir -p /workspace
WORKDIR /workspace
RUN git clone https://github.com/tcallsen/osm2graph-neo4j.git
WORKDIR /workspace/osm2graph-neo4j

# build repo
RUN mvn clean install


### Execute in second stage - only copy compiled jar to second stage to
###   reduce docker image size

FROM openjdk:11-jdk-slim

RUN mkdir -p /workspace
WORKDIR /workspace

COPY --from=builder /workspace/osm2graph-neo4j/target /workspace/osm2graph-neo4j

# execute loader
CMD ["java", "-jar", "osm2graph-neo4j/osm2graph-neo4j-0.0.2-SNAPSHOT.jar", "/workspace/in.osm", "/workspace/out/graph.db"]

In this case, introducing a multi-stage build reduces the image size by 100s of megabytes!

4. Use tools to identify large dependencies

There are great tools available to analyze Dockerfiles and figure out which steps/layers are contributing the most size to an image.

One tool I enjoy is dive, which is shown in the sequence below. Dive allows the user to inspect the layers (shown on the left), and dive into which parts of the filesystem are impacted (on the right):

Portainer is another great tool that provides a graphical interface for managing Docker containers and images. It is incredibly easy to set up, and gets rid of the tedious CLI commands required for house-keeping in Docker environments.

Portainer includes image layer exploration, but not with as much detain as dive

Check out Other Resources

Many articles and posts have been written on this topic, and other optimizations may work better for you specific case. A great starting point is always the official best practices guide published by Docker:

Subscribe by Email

Enter your email address below to be notified about updates and new posts.


Comments

Loading comments..

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *