Setting up your development environment, keeping data in sync across a bunch of machines, reproducing a colleague's result; lots of time is wasted in typical ML projects.
Data scientists often avoid collaborating because it's difficult with current tools, but the benefits of avoiding siloes are well documented.
If you are tracking results of experiments, or mappings from data to models in a spreadsheet, you're likely to make mistakes and not capture the full picture.
No reproducibility or provenance
By overwriting data or emailing notebooks you're creating future risk of not being able to fix a model that is buggy.
No automated testing
Without automated tests it's easy to accidentally deploy broken models into production, as described in What's your ML test score?
If you're manually deploying models into production by copying files, you risk losing track of what's running where and how it was created.
If you're not testing model drift and statistical behavior in production, your models could go haywire and you might not notice until real damage has been done.
As Google describes in The High Interest Credit Card of Technical Debt, ML systems are particularly susceptible to new and interesting forms of tech debt.
These issues make ML projects fail and create financial and reputational risk. Solving them promises more productive, effective AI teams and better and safer models.