Core Concepts#
This tutorial covers the fundamental building blocks of iTuna:
ConsistencyEnsemble - The main class for evaluating model consistency
Indeterminacy classes - How to handle different types of model ambiguity
Consistency scoring - Measuring and interpreting consistency
Working with embeddings - Accessing aligned representations
import numpy as np
from sklearn.decomposition import FastICA, PCA
import ituna
/hkfs/home/haicore/hgf_hmgu/hgf_sfs7789/git/itune-ref/ituna/_backends/utils.py:22: UserWarning: config_dataclass is not available, saving/loading Configurable objects will not be available
warnings.warn("config_dataclass is not available, saving/loading Configurable objects will not be available")
ConsistencyEnsemble#
ConsistencyEnsemble is iTuna’s main class. It wraps any sklearn-compatible transformer and:
Creates multiple clones of the base estimator
Fits each clone with a different random seed
Aligns the resulting embeddings under the specified indeterminacy
Computes consistency scores across all model pairs
Requirements for the base estimator#
Your model must follow the sklearn API:
Implement
fit(X)andtransform(X)methodsBe clonable via
sklearn.base.clone()Accept a
random_stateparameter (for reproducibility)
Most sklearn transformers work out of the box. For custom models, inherit from sklearn.base.TransformerMixin and sklearn.base.BaseEstimator.
Indeterminacy Classes#
Different representation learning algorithms are identifiable up to different classes of transformations. iTuna provides four built-in indeterminacy classes:
Class |
Transformation |
Example Models |
|---|---|---|
|
None (exact match) |
Fully identifiable models |
|
Sign flips + reordering |
FastICA, sparse coding |
|
Linear transformation |
PCA, factor analysis |
|
Linear + intercept |
CEBRA, autoencoders |
Choosing the correct indeterminacy class is crucial: if you pick one that’s too restrictive, consistent models will appear inconsistent. If you pick one that’s too permissive, you may miss genuine inconsistencies.
Example: FastICA with Permutation indeterminacy#
Independent Component Analysis (ICA) recovers independent sources from mixed signals. The recovered components are identifiable up to permutation and sign flips - we don’t know which component is which, or whether it’s flipped.
# Generate synthetic ICA data
np.random.seed(42)
n_samples = 2000
n_sources = 5
# Create independent sources
t = np.linspace(0, 10, n_samples)
sources = np.column_stack([
np.sin(2 * t), # Sinusoid
np.sign(np.sin(3 * t)), # Square wave
np.random.laplace(size=n_samples), # Super-Gaussian
np.random.uniform(-1, 1, n_samples), # Uniform
(t % 1) - 0.5, # Sawtooth
])
# Mix the sources
mixing_matrix = np.random.randn(n_sources, n_sources)
X_ica = sources @ mixing_matrix.T
X_ica += 0.1 * np.random.randn(*X_ica.shape) # Add noise
print(f"Data shape: {X_ica.shape}")
Data shape: (2000, 5)
# Create a FastICA model
ica_model = FastICA(n_components=5, max_iter=1000)
# Wrap in ConsistencyEnsemble with Permutation indeterminacy
ica_ensemble = ituna.ConsistencyEnsemble(
estimator=ica_model,
consistency_transform=ituna.metrics.PairwiseConsistency(
indeterminacy=ituna.metrics.Permutation(),
symmetric=False,
include_diagonal=True,
),
random_states=5, # Train 5 models with different seeds
)
# Fit the ensemble
ica_ensemble.fit(X_ica)
# Get consistency score
score = ica_ensemble.score(X_ica)
print(f"ICA Consistency score: {score:.4f}")
ICA Consistency score: 1.0000
Example: PCA with Linear indeterminacy#
PCA finds orthogonal directions of maximum variance. The principal components are identifiable up to linear transformations (rotations and reflections within eigenspaces of equal variance).
# Generate data for PCA
np.random.seed(42)
X_pca = np.random.randn(1000, 20)
# Create PCA model
pca_model = PCA(n_components=5)
# Wrap in ConsistencyEnsemble with Linear indeterminacy
pca_ensemble = ituna.ConsistencyEnsemble(
estimator=pca_model,
consistency_transform=ituna.metrics.PairwiseConsistency(
indeterminacy=ituna.metrics.Linear(),
symmetric=False,
include_diagonal=True,
),
random_states=5,
)
pca_ensemble.fit(X_pca)
score = pca_ensemble.score(X_pca)
print(f"PCA Consistency score: {score:.4f}")
PCA Consistency score: 1.0000
Understanding Consistency Scores#
The consistency score measures how well embeddings from different model instances align after accounting for the indeterminacy.
Score = 1.0: Perfect consistency - all models produce equivalent embeddings
Score close to 1.0: High consistency - models are reliably converging to the same solution
Low score: Models are finding different solutions, suggesting the representation may not be reproducible
The score is computed as the R² between embeddings after fitting the indeterminacy transformation.
Working with Embeddings#
After fitting, you can access the embeddings and alignment information via transform():
# Get embeddings with alignment metadata
embeddings = ica_ensemble.transform(X_ica)
print(f"Mean aligned embedding shape: {embeddings.shape}")
print(f"Number of individual model embeddings: {len(embeddings.embeddings)}")
# Access individual embeddings
for i, emb in enumerate(embeddings.embeddings):
print(f" Model {i} embedding shape: {emb.shape}")
Mean aligned embedding shape: (2000, 5)
Number of individual model embeddings: 5
Model 0 embedding shape: (2000, 5)
Model 1 embedding shape: (2000, 5)
Model 2 embedding shape: (2000, 5)
Model 3 embedding shape: (2000, 5)
Model 4 embedding shape: (2000, 5)
# Access pairwise consistency scores
pairs, scores = embeddings.scores
print("\nPairwise consistency scores:")
for (i, j), s in zip(pairs, scores):
print(f" Model {i} -> Model {j}: {s:.4f}")
Pairwise consistency scores:
Model 0 -> Model 0: 1.0000
Model 0 -> Model 1: 1.0000
Model 0 -> Model 2: 1.0000
Model 0 -> Model 3: 1.0000
Model 0 -> Model 4: 1.0000
Model 1 -> Model 0: 1.0000
Model 1 -> Model 1: 1.0000
Model 1 -> Model 2: 1.0000
Model 1 -> Model 3: 1.0000
Model 1 -> Model 4: 1.0000
Model 2 -> Model 0: 1.0000
Model 2 -> Model 1: 1.0000
Model 2 -> Model 2: 1.0000
Model 2 -> Model 3: 1.0000
Model 2 -> Model 4: 1.0000
Model 3 -> Model 0: 1.0000
Model 3 -> Model 1: 1.0000
Model 3 -> Model 2: 1.0000
Model 3 -> Model 3: 1.0000
Model 3 -> Model 4: 1.0000
Model 4 -> Model 0: 1.0000
Model 4 -> Model 1: 1.0000
Model 4 -> Model 2: 1.0000
Model 4 -> Model 3: 1.0000
Model 4 -> Model 4: 1.0000
# or use built in utils to convert to dense matrix
score_matrix = ituna.utils.sparse_to_dense(
*embeddings.scores,
shape=(len(embeddings.embeddings), len(embeddings.embeddings)),
)
print("Scores:\n", score_matrix)
Scores:
[[1. 0.99999872 0.99999922 0.99999964 0.99999706]
[0.99999872 1. 0.99999658 0.99999903 0.9999926 ]
[0.99999922 0.99999658 1. 0.99999911 0.9999992 ]
[0.99999964 0.99999903 0.99999911 1. 0.99999672]
[0.99999706 0.9999926 0.9999992 0.99999672 1. ]]
PairwiseConsistency Options#
The PairwiseConsistency transform has several options:
indeterminacy: The indeterminacy class to use for alignmentsymmetric: IfTrue, also compute j→i alignments (default:False)include_diagonal: IfTrue, include self-alignments i→i (default:True)
# Example with symmetric=True
symmetric_ensemble = ituna.ConsistencyEnsemble(
estimator=FastICA(n_components=5, max_iter=1000),
consistency_transform=ituna.metrics.PairwiseConsistency(
indeterminacy=ituna.metrics.Permutation(),
symmetric=True, # Include both i->j and j->i
include_diagonal=False, # Exclude self-alignments
),
random_states=3,
)
symmetric_ensemble.fit(X_ica)
emb = symmetric_ensemble.transform(X_ica)
pairs, scores = emb.scores
print(f"Number of pairwise comparisons: {len(pairs)}")
for (i, j), s in zip(pairs, scores):
print(f" Model {i} <-> Model {j}: {s:.4f}")
Number of pairwise comparisons: 3
Model 0 <-> Model 1: 1.0000
Model 0 <-> Model 2: 1.0000
Model 1 <-> Model 2: 1.0000
Custom Indeterminacy Classes#
You can also use any sklearn regressor as a custom indeterminacy class. The regressor is fitted to align embeddings from one model to another.
For example, to use Ridge regression:
from sklearn.linear_model import Ridge
# Use Ridge regression as indeterminacy
ridge_ensemble = ituna.ConsistencyEnsemble(
estimator=FastICA(n_components=5, max_iter=1000),
consistency_transform=ituna.metrics.PairwiseConsistency(
indeterminacy=Ridge(alpha=0.1), # Any sklearn regressor works
symmetric=False,
),
random_states=3,
)
ridge_ensemble.fit(X_ica)
print(f"Consistency score with Ridge: {ridge_ensemble.score(X_ica):.4f}")
Consistency score with Ridge: 1.0000
Summary#
Key takeaways:
ConsistencyEnsemblewraps any sklearn transformer to evaluate consistencyChoose the indeterminacy class based on your model’s theoretical identifiability:
Permutationfor ICA-like modelsLinearfor PCA-like modelsAffinefor models like CEBRA
Consistency scores close to 1.0 indicate reproducible representations
Use
transform()to access aligned embeddings and detailed pairwise scores
Next, check out the Backends tutorial to learn about caching and distributed computation.