Can you run your ML model in production yet?

How do you get your ML model into production?

It's easy to get mired in a mess of models, code, datasets, and metrics. Not to mention the infrastructure complexity of self-service Jupyter environments, clusters & pipelines. See the difference with Dotscience, a platform which manages the complexity for you, with easy deploys straight to Kuberenetes.

How can you tell when a model in production behaves unexpectedly?

Your production model might start giving the wrong answers—will you be able to tell? You want to know as soon as it happens, so you can make changes and address the problem. With Dotscience, you can statistically monitor models' behavior on unlabelled production data, by analyzing the statistical distribution of predictions.

As your development team grows, how do you keep track of your models?

Can you guarantee compliance with current and future regulation, or address stakeholders concerns with decisions made by a model? Can you keep track who created which model, and how? With Dotscience, you can forensically reproduce any issues and guarantee they are fixed, and reduce financial and reputational risks from AI. You can also easily share and collaborate on models.

Watch Dotscience in Action

Image for Build

Build

  • Install the Dotscience Python library with
    pip install dotscience
  • Develop and train your models in your choice of IDE, or in our hosted Jupyter
  • Track parameters and metrics with ds.param() and ds.metric()
Image for Deploy

Deploy

  • Save Tensorflow model files and tag them simply with ds.model()
  • Push the training run and the model to Dotscience with ds.publish()
  • Set deploy=True to automatically deploy to Kubernetes, return an endpoint, and set up monitoring
Image for Monitor

Monitor

  • Once deployed, a monitoring dashboard is automatically created for each model
  • As well as latency and error rates, Dotscience automatically monitors categories of predictions
  • Set up alerts when prediction distribution varies significantly from expected values

Deploy a model to Kubernetes and monitor its predictions in 60 seconds

Try Dotscience Deploy by training a model in Tensorflow Keras in the cloud. Deploy it to our Kubernetes cluster, and monitor the predictions, all within 60 seconds.

Try it now

End-to-End Data Engineering & Machine Learning Features

Run Tracker

Dotscience tracks, packages and links together together every run that goes into the data engineering and model creation process. Discover previous work and see exactly how it was built by tracking every version of every element in the model development phase.

Data Versioning

As part of tracking runs, Dotscience bundles with each run a complete snapshot of the project workspace filesystem and any dependent datasets, using copy-on-write technology to ensure that no more disk space is used than absolutely required to ensure reproducibility.

Provenance Graph

Trace from a model to its training data and back from that to the raw data, so that if stakeholders contend decisions made by a model, you can forensically reproduce inferences, and if there are issues, isolate and fix them.

Metric Explorer

Dotscience gives data science and ML engineering teams the unique ability to collaboratively track, record and share run metrics. Explore historic runs, and see relationships between hyperparameters & metrics so that you can gain insights into which hyperparameters to tune next, and make better decisions about where to invest time & effort.

S3 Datasets

Attach versioned S3 buckets as Dotscience Datasets while still tracking reproducibility & provenance all the way to the source. Dotscience mirrors the S3 dataset to local storage for extreme performance and reduced latency, and keeps track of which versions are accessed during data engineering and model training for you.

Bring Your Own Compute

Attach any compute: laptop, GPU rig, enterprise data center or cloud instances as runners. Be productive on a new runner in seconds, as Dotscience ensure an identical development environments even when you switch runner. Dotscience handles the storage and network complexity, all you need is an internet connection and Docker.

Auto Scaling

Simplify resource management by automatically provisioning runners on-demand from cloud infrastructure as users need them, enabling data scientists to self-serve Jupyter and script execution environments. Switch from CPU runners to GPU and back again seamlessly as requirements dictate. Optimize cloud spend with automatic shutdown when runners are idle. All changes are backed up to the hub when runners are shut down, as well as every time a run is recorded.

Pull Requests

Jupyter Notebooks are notoriously hard to use well with Git and GitHub. Dotscience lets you fork someone else's project, create new runs in notebooks and propose them back along with their metrics. See a full, clear full notebook diff and merge conflicting changes with ease.

Model Library

Publish your models from your projects to the Dotscience Model Library by labelling them as, for example, Tensorflow models. Trace back to complete model and data provenance, and forwards to deployment & monitoring.

Deploy to Production

Deploy your best model into production with a click or an API call. Dotscience will automatically build optimized Docker images and deploy them to an attached Kubernetes cluster. Use your own Kube cluster with our deployer agent.

Statistical Monitoring

Statistically monitor models to get an early warning when models behave unexpectedly. Monitor model behavior on unlabelled production data by analyzing the statistical distribution of predictions.

Headshot of Luca Palmieri, Machine Learning and Data Engineering at TrueLayer

“The world of ML has a lot to learn from all the best practices developed to handle the Software Engineering lifecycle in the last 10 years. Dotscience has the potential to bring some of those hard-learned lessons to the ML world without forcing data scientists and researchers to completely abandon their tools of choice, like Jupyter Notebooks. It's a bold proposition and has the potential to make a huge impact.”

Luca Palmieri, Machine Learning and Data Engineering at TrueLayer

Deep Dive Demo

Key Sections

Introduction

Simple Demo - getting started

Advanced Demo - full lifecycle

Conclusions

 

Motivation & Beliefs [0:07]

  • AI has the potential to make a positive impact on the world
  • But as a discipline it's immature
  • We've seen lots of problems affecting AI efforts: wasting time, inefficient collaboration, manual tracking, no reproducibility or provenance, no proper monitoring

We've been here before [0:37]

  • We've been here before: in the 90s software was siloed and slow
  • What changed? This movement called DevOps transformed the way we ship software
  • The same kind of paradigm shift is possible for AI
  • If only these following four requirements can be achieved, it's possible to achieve DevOps for ML

DevOps for ML Requirements [1:05]

  • Reproducibility: every model has to be reproducible. Someone else can come along 6 months later and re-run exactly the same training run of your model and get more-or-less the same result.
  • Accountable: every model must be accountable. That means the basis on which it made its decisions must be recorded. And that means knowing exactly what data it was trained on and how that data came to be.
  • Collaborative: The development environment for models has to be collaborative. I need to be able to pick up where you left off and try different things without treading on your toes.
  • Continuous: Proper model development requires a continuous lifecycle. You're not done when you ship, and deploying a model into production is just the start of a process of continuously monitoring it and improving it as the world changes. So models have to be retrained and statistically monitored for drift.

ML is different to software engineering [2:01]

  • Why can't we achieve these requirements using the existing tools we have for software?
  • The reason is the software lifecycle is much simpler than the model development lifecycle.
  • In software you have code which gets tested and deployed and monitored, and then you change the code and it goes round the loop.
  • Machine Learning is more complex: sure you have code, but that's just one of the inputs.
  • So what you're doing is you're building these models that automatically make predictions based on patterns they've observed in the data.
  • The way you create these models is by training them on a certain version of the code and certain parameters.
  • It's then that model artifact that's deployed into production and monitored.

Key Dotscience innovation: tracking runs [2:41]

  • So the key Dotscience innovation is that we're not just tracking versions of code.
  • We're tracking runs – these can either be data runs which happen when you're doing data engineering, or they can be model runs which happen when you're training a model.
  • In both cases, Dotscience is capturing and bundling together the complete context of everything that went into either creating an intermediate dataset or training a model
  • So the runs are all fully reproducible and you can connect data engineering to model training.
  • This means you can track back from a model running in production to exactly the context in which it was trained and recursively find out exactly what data it was trained on and where that data came from.
  • And all of this is done in an environment that's fully collaborative, so that people can learn from each other, try different things freely, and pick up where someone else left off.

Dotscience features [3:30]

  • In this demo I'm going to show you a number of features:
  • Track runs
  • Collaborate
  • Generate a provenance graph
  • Explore relationship between parameters and metrics
  • Deploy any model into production
  • Statistically monitor it once they're there
  • Flexibly attach any compute
  • Attach external datasets from S3

Machine learning model lifecycle [3:51]

  • Everything I'm going to show you is in the context of this machine learning model lifecycle
  • So we're going to start with data engineering where raw data gets processed
  • Then we're going into model development where we iteratively try a bunch of models and parameters to get the best performing model
  • As we do this we might go back into data engineering to tweak the way we're doing it
  • Then once we have a model we're happy with that looks like it's accurate we can try it out by deploying it into production
  • And then we get to see actually how well it performs in real life, and based on statistical monitoring and retraining on new datasets, we can then go back to the beginning and do more data engineering to build new models and then go round the lifecycle again.

Demo 1: Simple demo of getting started [4:41]

  • Starting with a simple example which you can try yourself for free on our website:
  • Signing up for a new account
  • Fork a sample project: makes an editable copy of the project
  • Must add a runner. Dotscience has a Hub, which is a repository of runs, data, code and models, and you attach Runners to the platform: it's on the Runners that the actual work of data engineering or machine learning model training will be executed
  • You can add your own machine -- bring your own compute, or use a Dotscience-provided runner
  • Click the button to attach a Dotscience-provided runner: will spin up a VM on Google Cloud and attach it automatically to your account so that you can play around
  • This VM will have Docker on it, and will automatically start the dotscience runner container which connects to the Hub and receives instructions
  • Runner is online and ready - first instruction we give it is to start Jupyter
  • Possible to go down a CLI route but Jupyter is easier to start with
  • You'll see some log messages as Jupyter starts up

Hello Dotscience Jupyter notebook [6:22]

  • Dotscience is a run tracker which helps with reproducibility
  • You specify the start and end of run and publish the run from your notebook
  • When you run your notebook in JupyterLab you get a new run recorded in the Dotscience tab. Run metadata is shown in the notebook cell output
  • Look at the same run in Dotscience and you will see the run metadata all the versions of the files involved and a snapshot of the notebook as you used it
  • How to capture metrics in Dotscience?
  • Specify what you want to capture via your notebook e.g. parameter value, summary statistic
  • Carry out the run, then go to Dotscience to see the run outputs on a plot with the same outputs from other runs. You can inspect these to see what inputs caused each value of the output per run

Tracking data in Dotscience [8:50]

  • Ingest data
  • Track input/output data per run
  • View a provenance graph per run
  • View the version of the data and the notebook for every run

Training an ML model in Dotscience [10:11]

  • Using: Linear regression, CSV file, outputting a model in a Pickle file
  • Track the notebook, metrics, data in each run
  • View provenance graph per run
  • Tune the model and see how the error rate changes on the Explore tab

Demo 2: More realistic example of complete data ML model lifecycle in Dotscience [12:18]

  • Using: S3, GitHub
  • We'll look at: data engineering, model development, deploy into production, monitor in prod
  • Create new project, Roadsigns
  • Use a local runner for compute. Connect runner to Github
  • Attach dataset which is in an S3 bucket
  • Add collaborators to give visibility and sharing

Data engineering in Dotscience [14:23]

  • Using: Python scripts, versioned in GitHub
  • Using: Script for ingesting data from S3
  • Split set into training and test
  • Wrap each operation in a Dotscience run
  • Use ds run on the CLI, specify Project, branch, GH repo and Docker image. Run metadata is output on the command line
  • View runs in Dotscience.
  • S3 also versioned in Dotscience

Model development in Dotscience [19:28]

  • Using: Neural net using Jupyter in model training notebook
  • Using: the data from the previous step
  • Note that the example model not very accurate - try a different subset
  • Overwrite your files in place without worrying. Dotscience captures each change as a new version
  • View updated plot in Dotscience - New dataset shows better accuracy

Deploy to production with Dotscience [29:43]

  • View model in Dotscience model library
  • Deploy with a single click into CI system
  • Note: S3 compatible API
  • Note: Deployed to AWS Kubernetes cluster

Asking for help: Collaboration with Dotscience [34:17]

  • Collaborate with Danesh to see if he can improve the model performance
  • Danesh can see all the background of what I tried
  • Danesh takes a copy and tries some new things
  • Both of us have made different changes, we can merge them back together
  • The most accurate run so far comes from using some changes from both people
  • Now Danesh can make a pull request on the Dotscience project
  • Manager can see history of progress in the accuracy plot chart

Statistical monitoring in production [43:54]

  • Go to Dotscience model library
  • Deploy to Gitlab CI
  • Note: Model less accurate on real data than expected.
  • View monitoring in Grafana and Prometheus - gives you the option to set up alerting on any unusual model behaviour

How Dotscience achieves the DevOps for ML manifesto [47:38]

Dotscience integrations [48:58]

  • Jupyter
  • Python
  • AWS
  • DOcker
  • CircleCI
  • Git
  • Tensorflow
  • Prometheus
  • S3
  • Kubernetes
  • Lots more.

Dotscience is available today [49:09]

  • Saas/Cloud service with a free account
  • AWS on your private VPC
  • On-prem
  • Or any hybrid of the above

Dotscience is highly differentiated [49:33]

  • Accelerate AI projects
  • Run anywhere
  • Model accountability
  • End-to-end AI platform
Try Dotscience
Dotscience works with every Python ML framework
TensorFlow Logo
Pytorch Logo
Scikit learn logo
MS logo
MX Logo
Keras Logo
Caffe logo
Theano logo
* Tensorflow only supported for Deploy and Monitor at this time