Wrapping Nomad's Docker Driver for Here and Now Solutions

Posted on May 28, 2016

Overview

This post demonstrates how to run a docker container with nomad, using a wrapper script.

UPDATE: This post insprired some interesting discussion on the Nomad mailing list, and a more generic solution to this problem, I recommend checking it out. See this script for more info.

UPDATE 2: as of the Nomad v0.5.0 release, Nomad has much better support for logging, ephemeral disk and volume mounts, etc, so this post is not as relevant as it once was.

This is still a good example of how great it is to work with flexible software (thanks Nomad!).

Why would you want to use this?

With the wrapper, we can more easily run the container in the way we need to and without being limited by Nomad’s docker driver. For example, while Nomad will have great support for volumes in the future, it has no such support right now, and the driver does not expose a config parameter to tune the volumes mounted in the docker container. This is also a great way to use consul to lookup services before starting your app, or to retrieve credentials from Vault. When running legacy applications with nomad, the wrapper script is the place to put that type of look-up logic.

We use a wrapper script and the raw_exec driver to run the container with the parameters we need.

Also, in the real-world, we sometimes run home-grown tools that may have external dependencies, funny expectations, or parameters that should be enabled if other parameters have been. In my experience, it’s been easier to have Nomad run these types of apps with the help of a wrapper script.

Why not call docker directly? A wrapper script makes it easy to work with the docker workflow, details like needing to stop and remove the named container before creating a new one.

How to Use

Place run-postgres in some $PATH, such as /usr/local/bin/run-postgres. Keep in mind, that wrapper script will need to exist on the hosts that could run the nomad job.
If you wish to limit the hosts where this job can run, configure a constraint with those agents. For example, to flag specific hosts as in the database tier, add the following to those agents’ config.json:

    "meta": {
      "tier": "database"
    },

Copy postgres.hcl somewhere, and edit it. Be sure to update the datacenters and constraints
nomad run postgres.hcl

If that was successful, you’ll see good things with nomad status postgres, and the container should show up with docker ps

Random Notes

This wrapper is in python, however you can use whatever language you wish for the wrapper with raw_exec
This method uses the raw_exec driver, and so isolation is reduced
The way the docker-py works, it’s a bit cumbersome/awkward to create a completely generic wrapper script, and each time I have done this, the app has had significantly different desires, so I have generally written a wrapper for each app I wish to run, rather than one wrapper to rule them all.
This example uses a non-unique name for the docker container, you might also want to use NOMAD_TASK_NAME to pass that on through
The PGDATA env var is used to specify the path to mount into the docker container, this maps to docker’s --workdir.
If the container fails for some reason (after starting), the script will exit and the job will show up as stopped/failed in nomad (which will then restart you job, depending on the job’s restart policy).
If you stop the job with nomad stop postgres, the script will exit, but the docker container will stay running. The script will attempt to stop/remove a running container when it starts, so that is fine for updates. Use docker stop postgres if you absolutely need to stop the container manually. The script can also include a signal handler that catches the signal from nomad and stops the container for you.

Wrapper Script

#!/usr/bin/env python

'''
This is roughly equivalent to..
    docker pull postgres:9.5
    docker stop postgres
    docker rm postgres
    docker create --name=postgres                     \
                  -p $NOMAD_IP_db:5432:$NOMAD_PORT_db \
                  -v $PGDATA:$PGDATA                  \
                  --net=host postgres:9.5
    docker start postgres
    docker logs -f postgres

NOTE: the code here is (intentionally) simple in an
effort to demonstrate the method, YMMV.

'''
from __future__ import print_function
import os
import sys
from exceptions import Exception

# docker client boilerplate
from docker import Client
cli = Client(base_url='unix://var/run/docker.sock')
cli.containers()

# specify the network mode, port bindings, and volume mounts.
# this is how the docker python client wants these parameters
port    = os.environ['NOMAD_PORT_db']
ip      = os.environ['NOMAD_IP_db']
workdir = os.environ['PGDATA']
host_config = cli.create_host_config(port_bindings={'5432': (ip,port)},
                                     network_mode='host',
                                     binds=[('%s:%s' % (workdir, workdir))])
# scrub env vars, could also pass in the env in its entirety
env = {'PGDATA': workdir}
service_name = 'postgres'
docker_repo  = 'postgres'
docker_tag   = '9.5'
image = '%s:%s' % (docker_repo, docker_tag)
print('wrapper: attempt to pull %s' % image)
try: cli.pull(repository=docker_repo, tag=docker_tag, stream=False)
# attempt graceful exit with helpful error message if the pull fails
except Exception as e: print(e); sys.exit()
print('wrapper: attempt to stop a running container/instance, if it exists')
try: cli.stop(service_name)
except: print('wrapper: skip stop, running container not found')
print('wrapper: attempt to remove an existing container, if it exists')
try: cli.remove_container(container=service_name, force=True)
except: print('wrapper: skip rm, existing/old container not found')
print('wrapper: attempt to create a new container..')
container = cli.create_container(image=image, detach=True, name=service_name,
                                 working_dir=workdir, ports=[port], environment=env,
                                 host_config=host_config)
print('wrapper: created %s' % container)
id=container.get('Id')
print('wrapper: attempt to start that container (%s)' % id)
cli.start(container=id)
print('wrapper: retrieve and print stdout/err...')
for msg in cli.logs(container=service_name, stream=True, stdout=True, stderr=True):
    print(msg, end="")

# could also include some signal handler to catch nomad stop or ctrl-c and
# stop/rm the running container
# that would look something like:
# define a signal handler that will gracefully stop the docker container when
# the user (or nomad/etc) send in a SIGINT. Do not RM the container, keep logs
#import signal
#def cleanup_docker(signal, frame):
#    '''
#    stop/rm the named container so it is not left lingering
#    '''
#    print "\nSIGINT received, initiating graceful shutdown"
#    try: cli.stop(c['name'])
#    except Exception as e: print('stopping container failed'); print(e)
#    sys.exit(0)
#
## register that handler
#signal.signal(signal.SIGINT, cleanup_docker)

Example Job

job "postgres" {
    group "postgres" {
        count = 1
        constraint {
            attribute = "${meta.tier}"
            value = "database"
        }
        task "postgres" {
            driver = "raw_exec"
            config {
                command = "/usr/local/bin/run-postgres"
                args = [
                ]
            }
            env {
                "PGDATA" = "/tmp/pgdata"
            }
            resources {
                cpu = 2000
                memory = 2000
                network {
                    mbits = 100
                    port "db" {}
                }
            }
            service {
                name = "postgres"
                port = "db"
                check {
                    type = "tcp"
                    interval = "15s"
                    timeout = "5s"
                }
            }
        }
    }
    type = "service"
    datacenters = ["foobar.us-west-1"]
}