Data + Docker = Disconbobulating?

Steph Locke (@SteffLocke)


Steph Locke | | T: SteffLocke



  • Data
  • Docker
  • Data + Docker
  • Demo setup
  • Basic demo
  • A database
  • Solving database challenges



Data is a business’ life blood. For some companies, it’s their entire value proposition. The generation, access, and retention of data is paramount. This yields a few rules of thumb:

  • Never delete, never surrender
  • Change with due consideration
  • Keep data safe

Types of data

  • Config
  • Reference data
  • Telemetry
  • Transactional data


  • Refreshing data
  • Scaling access
  • Being safe against disaster
  • Security



  • Contained code
  • Scripted infrastructure
  • Bundled dependencies
  • Lower maintenance than VMs

Why use it?

  • Scripted installs
  • Saved from “Next, Next, Next”
  • Insanely disposable
  • Control your ops
  • Rapidly developing

Why not use it?

  • Insanely disposable
  • Control your ops
  • Rapidly developing

Why not use it? Versions

How to use it (locally)

  • Install it
  • Run the Docker Quickstart Terminal
docker run -it ubuntu /bin/bash

How to use it - demo


docker-machine is needed to run containers, work on remote machines hosting docker containers, and to work with swarm clusters

Docker plugins

Plugins allow you to extend docker capabilities. Most plugins are network and file system related.

Follow on

Data + Docker

Data + Docker

Whenever you kill a container, you lose it’s contents so data can’t be stored in a container. So what’s the point?

External volumes

Docker containers can access a file share external to them.

This is a great way to persist data, especially if you use an external facility like Azure File Storage or Amazon S3 so they handle all infrastrastructure stuff.

Creating external volumes

# Create azure volumes
docker volume create \
       --name logs \
       -d azurefile \
       -o share=logs

Using external volumes

docker run \
    -v logs:/logs \

Demo Setup


  1. Create a docker-machine on Azure
  2. Configure docker-machine to use external file system plugin
  3. Create mapped volumes

Core script

Plugin script

All together

Basic Demo

Write to a file system

Multiple containers writing to same file

Why is this way bad?

Reading data


Starting a database

Get a docker container up and running. This will initialise database files in the directory.

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \

Make a database

Attach to existing database

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \

Attach to existing

Multiple databases running off same files

  • Can we do this multiple times with mysql?
  • What’s the problem, even if we could?

Multiple databases, same files

Database challenges

Primary challenges

  • Refreshing data
  • Scaling access
  • Being safe against disaster
  • Security

Refreshable data

Reference data can be stored in a number of ways:

  1. A core DB that gets replicated into local db
  2. A core DB and cross DB queries
  3. Take this data out of the DB and into caches

Scaling access

To scale access, you need to avoid locks:

  1. Performance tuning goes a long way
  2. Distributed databases
  3. Sharding


Keeping your data up and available:

  1. Self healing DB clusters
  2. Backups and restore
  3. Let someone else take care of it


Data needs to be secure, especially in a multi-tenant model:

  1. ACLs and row-level security
  2. Physically seperated databases

Translating challenges to technical solutions

Per instance databases

  • Pro: Scale resources per customer
  • Pro: Put other aspects per customer and control roll-out
  • Pro/Con: Can’t access all the customer’s data at once
  • Con: More migration operations

File / NoSQL dbs

  • Pro: Single db
  • Pro: Could do without schema migration efforts
  • Pro / Con: Unlikely to get ACID
  • Con: ACLs


  • Pro: Someone else worries about infrastructure
  • Pro: Can put into practice different single db / sharded db to suit
  • Pro: Scale resources per customer
  • Pro/Con: You don’t manage the infrastructure
  • Con: Unless containers hosted near the SaaS data store, potential latency

Self-healing Docker clusters

  • Pro: All Docker solution
  • Pro: Keeps control in the hands of the dev
  • Pro/con: The data is probably on infrastructure you manage
  • Con: Quite a complex solution with a low number of OoB solutions

Schema changes

  1. Use something like Flyway and migrate schema on new container creation
  2. Use something schemaless
  3. Use DBaaS and apply in one location, using feature flags etc for rollout

Further reading


Maybe Docker will solve some challenges for us?

Docker acquires Infinit, who’ve been building a distributed file system which Docker could utilise. Watch that space!

A contrasting opinion

Read the Joyent piece on persisting data