We version everything.
Unlike traditional version control systems, dotscience can version arbitrarily large datasets.
It uses a copy-on-write file system that allows it to take ‘thin’ snapshots as the data changes, for instance as you clean it, add to it, and perform feature engineering.
If your data isn’t stored in a local directory, for instance if its accessed using S3, GCS or Azure blob storage or via Apache Spark, we can version that too by taking a secure snapshot as the data is read.
When you try to pull the data again, Dotscience will notify you if it has changed from the stored snapshot.