Deploy your models to production in 60 seconds! How does that work?

/blog/2020-02-11-deployment/images/featured_hu3d03a01dcc18bc5be0e67db3d8d209a6_1970259_1595x1148_resize_q75_box.jpg

AI deployment is difficult

A huge issue for businesses trying to use AI and machine learning today is that they can’t deploy their models. Their data science team comes up with a great analysis that will add millions in business value per year, but the model never sees the light of day because no one can get the deployment to work. Either they lack the room full of engineers needed to architect the process, or the needed steps and tools are stuck unapproved by IT, or there is simply not the communication of value to justify the time and expense.

How does Dotscience help?

What if there was a better way? Here at Dotscience, we won’t pretend we have solved the whole world yet, but we would like to think that we at least make model production deployment tractable for a great many more businesses. Instead of a room full of expensive people and 6 months of time, maybe instead you just need your data scientist, your ML engineer, and perhaps some consulting from us if you are integrating into your existing company system. Or if you use our existing setup maybe it really is only one click in the GUI.

OK sure, but does it really work? Being founded and developed by a team of DevOps experts, along with some ML people to be sure the tools make sense for ML, we are in a decent position to claim that it does. We would also point to some of the case studies in the blog where people who are not us agree. Much as DevOps for software in general has markedly improved over the last decade, we are now on the journey to similarly improving DevOps for machine learning, also known as MLOps.

So what do we do? Dotscience is a flexible tool, so there are many potential deployment paths. A common one is shown here:

Figure 1

Dotscience deployment setup in one of its common forms: TensorFlow -> Docker -> GitLab -> Kubernetes -> Prometheus -> Grafana. This can be used as-is, or components swapped out.

A thing to notice from this is that all the steps are using readily available tools. So what we do to deploy you could do yourself. It comes back to time and effort to set it up to work in production. Our value-add is therefore that you can get the benefits of this MLOps approach without having to set it all up yourself.

The caption also mentions that components can be swapped out. This is true because of the strong foundation on which the product is built. Thus, the ML doesn’t have to be TensorFlow, the CI/CD doesn’t have to be GitLab, the deploy doesn’t have to be Kubernetes, and similarly for Prometheus and Grafana. It can be adjusted to suit the needs of your business.

In the spirit of full disclosure, speaking not just as a Dotscience evangelist but also as a user, once you do start swapping out components, it can become more time consuming to finalize a setup. There is an inevitable tradeoff between using the most fully supported defaults and allowing in more tools. But this will only get better over time. If your company needs a different setup but still wants AI with MLOps, we have a full consulting capability available.

Ways to deploy

There are several ways to deploy using the tool.

GUI

To deploy a model to production in the GUI, do the following:

Start by creating a project, and writing your analysis that produces a model. Most commonly this will be in Python 3 in a Jupyter notebook in our built-in JupyterLab. Or you might write a .py script and call it. Also most commonly, the model will be using TensorFlow, which means deep learning or one of TensorFlow’s other supported methods like decision trees. (Because Dotscience is bring-your-own-compute, via your on-prem machine or cloud instance acting as a runner, GPUs and so on are easily brought in.) Other methods like Scikit-learn are also supported.

Once a model is created and saved to Dotscience using the ds() functions (see, e.g., https://dotscience.com/blog/2020-01-21-annotate-code), your project will include things like the provenance graph for data, models, and runs:

Figure 2

Then, on another tab, there is the model list:

Figure 3

This is the part where the GUI makes model deploy easy. On the right of the model list are buttons saying “Build”, or “Deploy”. Clicking Build triggers a job on one of our integrated continuous integration (CI) systems (e.g., GitLab, or CircleCI). This creates a Docker container that includes the model, the environment necessary for it to run, the pre- and post-processing code to convert incoming raw data to model inputs and/or deal with outputs, and any other code necessary to a particular analysis. The container is put into a Docker registry.

Clicking Deploy similarly activates our integrated continuous delivery (CD) tool, which sends the Docker container to be deployed on our Kubernetes cluster, Kubernetes being the well-known tool for orchestrating the running of many containers on one system.

Figure 4

And that’s it: your model is now deployed to production - in a container, as a microservice, on Kubernetes. If you click through it all quick enough, and the build is on a fast machine, you may even achieve the deployment in 60 seconds promised by the title :)

In our demos, we show how the resulting deployed model can be used in a webapp to send predictions. The model endpoint (i.e., where the model is) is supplied, and in this example you can have it predict MNIST digits:

Figure 5

or roadsigns:

Figure 6

Obviously any other data can be passed to an appropriately trained model in the same way.

While this easy process is on our Dotscience Hub, the same can be done at your company because a Dotscience Hub can be installed on your own cloud account or on-premise.

Script or REST API

Of course, in real work, not everyone wants to click through a GUI every time to do something. Likewise a production deployment handling millions of rows of data will not involve clicking an image for each row to send the prediction to the model. So Dotscience deployments are fully scriptable.

The whole system can be accessed through the Python library, and the command line. The equivalent to logging into the GUI, writing a Jupyter notebook, and running it is to use ds.connect(), write a .py, and run this with ds run. This involves commands like

ds.connect(
	"DOTSCIENCE_USERNAME",
	"DOTSCIENCE_APIKEY",
	"DOTSCIENCE_PROJECT_NAME",
	"DOTSCIENCE_HOSTNAME"
)

if connecting in Dotscience anywhere mode, and

ds project create this-is-my-project
ds run -p hello-ds --upload-path . python test.py

when connected in any way and using the command line. DOTSCIENCE_USERNAME, etc., are user-dependent; this-is-my-project is the project name, and test.py is your Python script containing data analysis and building models.

You can then deploy to the same setup as in the GUI example above using commands in the script like

ds.model(tf, "mnist", "model", classes="model/classes.json")
ds.publish("Trained MNIST model", deploy=True)

or ds deployment on the command line.

Predictions can then be sent with curl, e.g.,

curl --request POST \
  --url http://localhost:8501/v1/models/model:predict \
  --header 'content-type: application/json' \
  --data '{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}'

The URL points to the deployed model, in this case on localhost port 8501.

The commands shown here are taken from various parts of the tutorials in our documentation. Head on over to those pages if you want to see this in more detail, including working end-to-end examples.

Other ways

Being a cool startup DevOps team, my colleagues are always interested in new ways to have the product used, deployed, and integrated with other tools. Therefore the following are some other deployment-type scenarios that we have:

Monitoring

Once your model is deployed, what about monitoring? Most ML models will degrade over time: data drift and concept drift (changing labels for supervised models) will cause the outputs to become changed, leading to model drift. This means that it is essential to monitor your models in production.

As we saw above, when a model is deployed in Dotscience, it is available as an endpoint that can be monitored. When monitoring is activated, requests sent to and responses sent from the model are handled by the Dotscience model proxy. This works with TensorFlow serving or similar services and means our users can capture the statistics needed to ensure the model is running correctly in production, or be alerted when it is not.

In the GUI, monitoring is activated by moving from the Models tab to the Deployment tab, and clicking Monitor for any deployed model. This opens a Grafana window, giving users the full functionality of that tool:

Figure 7

Grafana is a general system, so you can take inputs from Prometheus and write any query needed. We will show some more detailed monitoring examples in a future blog entry. Prometheus and PromQL are most commonly used for monitoring deployments of software in the more traditional DevOps way: throughput, latency, uptime, and so on. But the language is generic and is dealing with incoming time series data, so it is therefore suitable for monitoring the outputs in real time from ML models. Also, as noted above, although we talk about TF serving, Grafana, etc., here, again Dotscience is general enough that you are not restricted to using all the exact components mentioned.

As well as the GUI, monitoring can also be deployed from the Dotscience Python library, using ds.publish(..., deploy=True), either in the notebook or from the command line.

Conclusion

We have shown how Dotscience makes MLOps and deployment of machine learning models easier for real businesses. In addition to deploy, users also get the other benefits of using our system: reproducibility, accountability, collaboration, and continuous delivery. This combination of full MLOps with tractable deploy results in an end-to-end AI + machine learning system suitable for real world usage.

For more details about how Dotscience works on a technical level beyond deploy and what is covered in this article, see the Dotscience technology page.



Try it out!

You can try out Dotscience for free right now, or for more details about the product, head over to our product page and our documentation page .

Written by:

Dr. Nick Ball, Principal Data Scientist (Product) at Dotscience