dashi documentation
dashi is a Python library designed to analyze and characterize temporal and multi-source dataset shifts. It provides robust tools for both supervised and unsupervised evaluation of dataset shifts, empowering users to detect, understand, and address changes in data distributions with confidence.
Key Features
Supervised Characterization
Enables users to create classification or regression models using Random Forests trained on batched data (temporal or multi-source). This allows for the detailed analysis of how dataset shifts impact model performance, helping to pinpoint areas of potential degradation.
Unsupervised Characterization
Facilitates the identification of temporal dataset shifts by projecting and visualizing data dissimilarities across time. This process involves:
- Estimating data statistical distributions over time.
- Projecting these distributions onto non-parametric statistical manifolds.
- These projections reveal patterns of latent temporal variability in the data, uncovering hidden trends and shifts.
Visualization Tools
To aid exploration and interpretation of dataset shifts, dashi includes visual analytics features such as:
- Data Temporal Heatmaps (DTHs): Provide an exploratory visualization for temporal shifts in data distributions.
- Information Geometric Temporal (IGT) plots: Offer a more sophisticated view of temporal data variability by means of embedding temporal batches in their latent statistical manifolds.
- Multi-batch contingency matrices: Compare multiple evaluation metrics (F1-Score, Recall, Precision, AUC, etc.) across training-test combinations between pairwise batches, either temporal or multi-source.