FuseMachines love Dotscience

/blog/2019-09-26-fusemachine-summary-video/images/featured_hu3d03a01dcc18bc5be0e67db3d8d209a6_4381606_1595x1148_resize_q75_box.jpg

FuseMachines is a growing data science consultancy and their data scientists like to participate in Kaggle competitions; for fun and to show off their skills to potential clients. FuseMachines used Dotscience for around 6 weeks to work on two Kaggle challenges: the 2018 Data Science Bowl and the Freesound General-Purpose Audio Tagging.

The purpose of the project was to assess whether or not Dotscience allowed the FuseMachines team to collaborate and iterate on ideas faster than their traditional tooling and they were kind enough to put together a video showing an overview of their very positive findings. The following is a brief summary, check the full video at the bottom of the post.

Normally, the FuseMachines team would manually deploy EC2 instances and set up their development environments manually which would often take one or more days to configure correctly. When using Dotscience Runners however, the team found that they could configure their development environments in a matter of minutes. A huge win!

Once the development environments were ready the team could immediately deploy a Jupyter instance in the Dotscience Hub and start ingesting data for the relevant Kaggle competitions. Upon ingestion Dotscience starts tracking the imported data and begins automatically creating a provenance graph, showing a versioned representation of all the moving parts in the project.

They show that when using Dotscience’s collaboration feature users can easily add a collaborator to their project who is then automatically given access to an entire development environment containing an exact clone of the original project. The collaborator can then use the provenance graph, project history and metrics dashboard to quickly understand the full state of the project they have been invited to participate in.

Finally, FuseMachines cover a few of their other favourite takeaways from the experience including side-by-side diffing for Jupyter notebooks, the provenance graph and it’s use in quickly understanding projects, the ease of the built-in TensorBoard support and convenience of being able to run all the commands that are available in the Dotscience Hub on the command line using the ds command line tool on their own local machine.

Would you like to iterate more quickly on your Kaggle competitions by removing unnecessary overhead around environment creation and team collaboration? Try Dotscience now.

Written by:

Mark Coleman