Quick Start Guide
This guide walks you through the main features of dashi.
1. Data Formatting
Before any analysis, format your DataFrame so that dates and types are correct:
import pandas as pd
import dashi as ds
df = pd.read_csv('my_data.csv')
df = ds.format_data(
df,
date_column_name='date',
date_format='%Y/%m/%d',
numerical_column_names=['age', 'weight'],
categorical_column_names=['gender', 'diagnosis']
)
2. Unsupervised Temporal Analysis
Estimate how variable distributions change over time:
# Univariate analysis
dtm = ds.estimate_univariate_data_temporal_map(
data=df,
date_column='date',
period='month'
)
# Plot heatmap
plot = ds.plot_univariate_data_temporal_map(dtm['weight'])
# Multivariate analysis with dimensionality reduction
mv_dtm = dashi.estimate_multivariate_data_temporal_map(
data=df,
date_column_name='date',
period='month',
dim_reduction='PCA',
dimensions=2
)
# Plot heatmap
plot = ds.plot_multivariate_data_temporal_map(mv_dtm)
3. Unsupervised Multi-Source Analysis
Compare distributions across different data sources:
dsm = ds.estimate_univariate_data_source_map(
data=df,
source_column='hospital'
)
plot = ds.plot_univariate_data_source_map(dsm['weight'])
4. Variability Metrics (IGT & MSV)
Quantify temporal or source variability:
# Information Geometric Temporal (IGT) projection
igt = ds.estimate_igt_projection(dtm, embedding_type='classicalmds')
plot = ds.plot_IGT_projection(igt)
# Multi-Source Variability (MSV) metrics
msv = ds.estimate_MSV_metrics(dsm)
plot = ds.plot_MSV(msv)
5. Supervised Characterization
Evaluate model performance across temporal or source batches:
metrics = ds.estimate_multibatch_models(
data=df,
inputs_numerical_column_names=['age', 'weight'],
inputs_categorical_column_names=['gender'],
output_classification_column_name='diagnosis',
date_column_name='date',
period='month',
learning_strategy='from_scratch',
model_type='histogram_gradient_boosting'
)
performance_df = ds.arrange_performance_metrics(
metrics=metrics,
metric_name='AUC_MACRO'
)
plot = ds.plot_performance(
performance_df,
metric_name='ROC-AUC_MACRO
)