Imagine that we have the following Postgresql database dump:
It is a simple database with a Clients table and 2 records.
If we want to start a Postgresql Docker container with this dump loaded to share with our team, we can add this SQL file into the /docker-entrypoint-initdb.d/ folder inside the container, like explained into the Postgresql Image docs from DockerHub.
Initialization scripts
If you would like to do additional initialization in an image derived from this one, add one or more *.sql, *.sql.gz, or *.sh scripts under /docker-entrypoint-initdb.d (creating the directory if necessary). After the entrypoint calls initdb to create the default postgres user and database, it will run any *.sql files, run any executable *.sh scripts, and source any non-executable *.sh scripts found in that directory to do further initialization before starting the service.
The following Dockerfile uses postgres:11-alpine as base image and copies test_dump.sql file to the entrypoint folder.
If we build this image
and start a container with the generated image
we can see in our database that the database was created. (password is postgres)
Awesome. Now we have a docker image that has our database loaded. But if we check the log of this container
we can see CREATE DATABASE and CREATE TABLE commands.
This tell us that the dump is being processed every time we create the container. If we destroy this container and create a new one, the dump will be processed again. This works fine but if we have a big database with a big dump file, the startup process of the container will be slow because it can take some time to process the whole dump. We can fix it by keeping the database preloaded in the image.
Before we moving on, let’s destroy the container we created
Preloading the database in the image
To preload the database in the image, we need to tell our Dockerfile to execute the same entrypoint of the original PostgreSQL image so it can execute the dump in the build step. Let’s use Multi-Stage build to divide our build in two steps. The first one will execute the entrypoint with the dump file and the second one will copy the data folder to the resulting image.
In the first step, we have the following instructions:
FROM postgres:11-alpine as dumper We define the base image our step will use. postgres with the 11-alpine tag in this case.
COPY test_dump.sql /docker-entrypoint-initdb.d/ Copy the test_dump.sql file to the /docker-entrypoint-initdb.d/ folder.
RUN [“sed”, “-i”, “s/exec "$@"/echo "skipping…"/”, “/usr/local/bin/docker-entrypoint.sh”] We need to execute this sed command in order to remove the exec "$@" content that exists in the docker-entrypoint.sh file so it will not start the PostgreSQL daemon (we don’t need it on this step).
ENV POSTGRES_USER=postgres; ENV POSTGRES_PASSWORD=postgres; ENV PGDATA=/data Sets environment variables to define user and password and tell PostgreSQL to use /data as data folder, so we can copy it in the next step
RUN [“/usr/local/bin/docker-entrypoint.sh”, “postgres”] Execute the entrypoint itself. It will execute the dump and load the data into /data folder. Since we executed the sed command to remove the $@ content it will not run the PostgreSQL daemon
The second step contains only this instruction:
COPY — from=dumper /data $PGDATA This will copy all files from /data folder from the dumper step into the $PGDATA from this current step, making our data preloaded when we start the container (without needing to run the dump every time we create a new container).
If we build this Dockerfile
We can see in the output the dump being processed and after everything is finished, the image is built.
and we can start the container with this new image
and our database is loaded
But if we check the logs now, the dump is not being processed every time we create the container
We can see that only the PostgreSQL startup is being done. No dump is being executed because it was executed in the build image step.
Creating a Makefile to make the process easier
I like to create a Makefile to make easier the process of making a database dump and creating an image. This Makefile will contain commands to create the dump the database, create an image and tag it by date allowing me to have daily dumps on my registry to download.
And I can execute the following command to generate my image with a new dump
This command usually is integrated in a Cron job in some server to be executed daily. With this I can have on my image registry dumps from each day.