Dotmesh is the Best Way to Seed Your CI Pipeline.

/blog/ci-pipeline-setup/images/featured_hu3818f4b0b41cd22c07ccae8fa01a516c_193965_1595x1148_resize_q75_box.jpg

Yesterday, we looked at how to capture CI pipeline states completely as a single unit, in total, whether successful or failed, without holding anyone up.

Today, we will see how dotmesh can make seeding CI pipelines, even the most complex, dramatically easier and faster.

To recap, yesterday we highlighted that as the CI pipeline becomes ever more complex, engineers and CI managers ask themselves several key questions:

  1. How do I build a CI environment that closely correlates with - or is a “scalable version of” - the actual production environment?
  2. How do I set up CI tests as quickly as possible, when each setup can take a long time to initialize multiple data stores to get them to precisely the correct state for the tests at hand?
  3. Most importantly, how do I capture the complete state of CI after a test run?

Yesterday we answered capturing complete post-CI state, including data from multiple databases; today let’s look at the how and why of seeding data.

CI Like Production and Speed.

In general, companies answer the question of setting up CI like prod in one of several ways:

  • Stick to smaller-scale tests and pray that production isn’t too different, a.k.a the “head in the sand” approach.
  • Build complex test initialization processes which start up every necessary data store, usually as a standalone process or containers, initialize them and seed them with the correct data for the tests, a.k.a. the “tests are harder to build than code” approach.
  • Deploy a set of continuously running test databases, against which all tests run, and try to coordinate tests such that two parallel tests do not trounce each other, a.k.a. “brittle infrastructure” approach.

None of these approaches is very satisfying. The first approach ignores the problem entirely, solving only really simple problems in testing while deferring the really hard ones to production… the last place you want them!

The second approach at least addresses the issue, but tries to solve it by burdening the engineer with setting up everything needed, until eventually she or he spends more time building and maintaining the tests than doing the valuable activity for which they are paid, i.e. writing software.

The last approach tries to reconcile the two previous ones by providing databases “ready to run”, along with a DBA function to maintain them. Unfortunately, the “shared everything” approach mandates no parallel tests, and makes the state of the data in the databases somewhat unknown, leading some to question test validity.

In reality, the first two issues - reliable testing environment and fast testing environment - are the same. Companies implementing CI with complex datastores trade off accuracy for speed.

Everyone really is looking for the same thing: databases, even many of them, that can be initiated at a moment’s notice, dedicated to the individual test run, with a precisely known state.

Enter dotmesh.

With dotmesh, your flow looks a little more positive and a lot more realistic:

  1. Create your databases, all of them, and store them in a precisely-known state.
  2. Capture the exact state of all of your databases into a single dot, each databases as a separate subdot.

You can get to this state any way you want it: human interaction with your system, SQL commands, API calls, Selenium, take your pick. In general, we recommend a reproducible method, like SQL commands or Selenium, but you certainly can start with human interaction.

Now that your seed is ready, running your tests becomes a trivial exercise:

  1. Include requirements for all of your databases in your test specification.
  2. In your pre stage of your test, dm clone your dot and check out a specific commit.
  3. Link your database systems to the target dot, each database to its respective subdot at a specific commit.
  4. Run your tests.

When your CI runs, at start, each database will be in precisely the desired state for the beginning of tests. No long-running initialization or seeding is necessary, no complex setups, just a simple config file that is part of your test suite.

Running your tests and maintaining them now consists of a single standard setup step, and a single config file. In docker-compose.yml, it might look like this:

version: '3.4'
services:
  web:
    build: .

  redis:
    image: redis
    volumes:
     - ciexample.redis:/redis

  mysql:
    image: mysql
    volumes:
     - ciexample.mysql:/var/lib/mysql
    environment:
      MYSQL_ROOT_PASSWORD: insecureroot

volumes:
  ciexample.redis:
    driver: dm
    name: mydot.redis
  ciexample.mysql:
    driver: dm
    name: mydot.mysql

In the future, you will be able to select not just the dot and subdot, but the specific branch and commit right there in your configuration, further simplifying the process and bringing you complete infrastructure-as-code.

Of course, the dotmesh team is looking at even closer integration with CI providers, both SaaS and on-premise software, to build integration into your favourite CI platform!

Written by:

Avi Deitcher