Core Concepts

Core Concepts#

This tutorial covers the fundamental building blocks of iTuna:

ConsistencyEnsemble - The main class for evaluating model consistency
Indeterminacy classes - How to handle different types of model ambiguity
Consistency scoring - Measuring and interpreting consistency
Working with embeddings - Accessing aligned representations

import numpy as np
from sklearn.decomposition import FastICA, PCA

import ituna

/hkfs/home/haicore/hgf_hmgu/hgf_sfs7789/git/itune-ref/ituna/_backends/utils.py:22: UserWarning: config_dataclass is not available, saving/loading Configurable objects will not be available
  warnings.warn("config_dataclass is not available, saving/loading Configurable objects will not be available")

ConsistencyEnsemble#

ConsistencyEnsemble is iTuna’s main class. It wraps any sklearn-compatible transformer and:

Creates multiple clones of the base estimator
Fits each clone with a different random seed
Aligns the resulting embeddings under the specified indeterminacy
Computes consistency scores across all model pairs

Requirements for the base estimator#

Your model must follow the sklearn API:

Implement fit(X) and transform(X) methods
Be clonable via sklearn.base.clone()
Accept a random_state parameter (for reproducibility)

Most sklearn transformers work out of the box. For custom models, inherit from sklearn.base.TransformerMixin and sklearn.base.BaseEstimator.

Indeterminacy Classes#

Different representation learning algorithms are identifiable up to different classes of transformations. iTuna provides four built-in indeterminacy classes:

Class	Transformation	Example Models
`Identity`	None (exact match)	Fully identifiable models
`Permutation`	Sign flips + reordering	FastICA, sparse coding
`Linear`	Linear transformation	PCA, factor analysis
`Affine`	Linear + intercept	CEBRA, autoencoders

Choosing the correct indeterminacy class is crucial: if you pick one that’s too restrictive, consistent models will appear inconsistent. If you pick one that’s too permissive, you may miss genuine inconsistencies.

Example: FastICA with Permutation indeterminacy#

Independent Component Analysis (ICA) recovers independent sources from mixed signals. The recovered components are identifiable up to permutation and sign flips - we don’t know which component is which, or whether it’s flipped.

# Generate synthetic ICA data
np.random.seed(42)
n_samples = 2000
n_sources = 5

# Create independent sources
t = np.linspace(0, 10, n_samples)
sources = np.column_stack([
    np.sin(2 * t),           # Sinusoid
    np.sign(np.sin(3 * t)),  # Square wave
    np.random.laplace(size=n_samples),  # Super-Gaussian
    np.random.uniform(-1, 1, n_samples),  # Uniform
    (t % 1) - 0.5,           # Sawtooth
])

# Mix the sources
mixing_matrix = np.random.randn(n_sources, n_sources)
X_ica = sources @ mixing_matrix.T
X_ica += 0.1 * np.random.randn(*X_ica.shape)  # Add noise

print(f"Data shape: {X_ica.shape}")

Data shape: (2000, 5)

# Create a FastICA model
ica_model = FastICA(n_components=5, max_iter=1000)

# Wrap in ConsistencyEnsemble with Permutation indeterminacy
ica_ensemble = ituna.ConsistencyEnsemble(
    estimator=ica_model,
    consistency_transform=ituna.metrics.PairwiseConsistency(
        indeterminacy=ituna.metrics.Permutation(),
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,  # Train 5 models with different seeds
)

# Fit the ensemble
ica_ensemble.fit(X_ica)

# Get consistency score
score = ica_ensemble.score(X_ica)
print(f"ICA Consistency score: {score:.4f}")

ICA Consistency score: 1.0000

Example: PCA with Linear indeterminacy#

PCA finds orthogonal directions of maximum variance. The principal components are identifiable up to linear transformations (rotations and reflections within eigenspaces of equal variance).

# Generate data for PCA
np.random.seed(42)
X_pca = np.random.randn(1000, 20)

# Create PCA model
pca_model = PCA(n_components=5)

# Wrap in ConsistencyEnsemble with Linear indeterminacy
pca_ensemble = ituna.ConsistencyEnsemble(
    estimator=pca_model,
    consistency_transform=ituna.metrics.PairwiseConsistency(
        indeterminacy=ituna.metrics.Linear(),
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,
)

pca_ensemble.fit(X_pca)
score = pca_ensemble.score(X_pca)
print(f"PCA Consistency score: {score:.4f}")

PCA Consistency score: 1.0000

Understanding Consistency Scores#

The consistency score measures how well embeddings from different model instances align after accounting for the indeterminacy.

Score = 1.0: Perfect consistency - all models produce equivalent embeddings
Score close to 1.0: High consistency - models are reliably converging to the same solution
Low score: Models are finding different solutions, suggesting the representation may not be reproducible

The score is computed as the R² between embeddings after fitting the indeterminacy transformation.

Working with Embeddings#

After fitting, you can access the embeddings and alignment information via transform():

# Get embeddings with alignment metadata
embeddings = ica_ensemble.transform(X_ica)

print(f"Mean aligned embedding shape: {embeddings.shape}")
print(f"Number of individual model embeddings: {len(embeddings.embeddings)}")

# Access individual embeddings
for i, emb in enumerate(embeddings.embeddings):
    print(f"  Model {i} embedding shape: {emb.shape}")

Mean aligned embedding shape: (2000, 5)
Number of individual model embeddings: 5
  Model 0 embedding shape: (2000, 5)
  Model 1 embedding shape: (2000, 5)
  Model 2 embedding shape: (2000, 5)
  Model 3 embedding shape: (2000, 5)
  Model 4 embedding shape: (2000, 5)

# Access pairwise consistency scores
pairs, scores = embeddings.scores

print("\nPairwise consistency scores:")
for (i, j), s in zip(pairs, scores):
    print(f"  Model {i} -> Model {j}: {s:.4f}")

Pairwise consistency scores:
  Model 0 -> Model 0: 1.0000
  Model 0 -> Model 1: 1.0000
  Model 0 -> Model 2: 1.0000
  Model 0 -> Model 3: 1.0000
  Model 0 -> Model 4: 1.0000
  Model 1 -> Model 0: 1.0000
  Model 1 -> Model 1: 1.0000
  Model 1 -> Model 2: 1.0000
  Model 1 -> Model 3: 1.0000
  Model 1 -> Model 4: 1.0000
  Model 2 -> Model 0: 1.0000
  Model 2 -> Model 1: 1.0000
  Model 2 -> Model 2: 1.0000
  Model 2 -> Model 3: 1.0000
  Model 2 -> Model 4: 1.0000
  Model 3 -> Model 0: 1.0000
  Model 3 -> Model 1: 1.0000
  Model 3 -> Model 2: 1.0000
  Model 3 -> Model 3: 1.0000
  Model 3 -> Model 4: 1.0000
  Model 4 -> Model 0: 1.0000
  Model 4 -> Model 1: 1.0000
  Model 4 -> Model 2: 1.0000
  Model 4 -> Model 3: 1.0000
  Model 4 -> Model 4: 1.0000

# or use built in utils to convert to dense matrix
score_matrix = ituna.utils.sparse_to_dense(
    *embeddings.scores,
    shape=(len(embeddings.embeddings), len(embeddings.embeddings)),
)
print("Scores:\n", score_matrix)

Scores:
 [[1.         0.99999872 0.99999922 0.99999964 0.99999706]
 [0.99999872 1.         0.99999658 0.99999903 0.9999926 ]
 [0.99999922 0.99999658 1.         0.99999911 0.9999992 ]
 [0.99999964 0.99999903 0.99999911 1.         0.99999672]
 [0.99999706 0.9999926  0.9999992  0.99999672 1.        ]]

PairwiseConsistency Options#

The PairwiseConsistency transform has several options:

indeterminacy: The indeterminacy class to use for alignment
symmetric: If True, also compute j→i alignments (default: False)
include_diagonal: If True, include self-alignments i→i (default: True)

# Example with symmetric=True
symmetric_ensemble = ituna.ConsistencyEnsemble(
    estimator=FastICA(n_components=5, max_iter=1000),
    consistency_transform=ituna.metrics.PairwiseConsistency(
        indeterminacy=ituna.metrics.Permutation(),
        symmetric=True,   # Include both i->j and j->i
        include_diagonal=False,  # Exclude self-alignments
    ),
    random_states=3,
)

symmetric_ensemble.fit(X_ica)
emb = symmetric_ensemble.transform(X_ica)

pairs, scores = emb.scores
print(f"Number of pairwise comparisons: {len(pairs)}")
for (i, j), s in zip(pairs, scores):
    print(f"  Model {i} <-> Model {j}: {s:.4f}")

Number of pairwise comparisons: 3
  Model 0 <-> Model 1: 1.0000
  Model 0 <-> Model 2: 1.0000
  Model 1 <-> Model 2: 1.0000

Custom Indeterminacy Classes#

You can also use any sklearn regressor as a custom indeterminacy class. The regressor is fitted to align embeddings from one model to another.

For example, to use Ridge regression:

from sklearn.linear_model import Ridge

# Use Ridge regression as indeterminacy
ridge_ensemble = ituna.ConsistencyEnsemble(
    estimator=FastICA(n_components=5, max_iter=1000),
    consistency_transform=ituna.metrics.PairwiseConsistency(
        indeterminacy=Ridge(alpha=0.1),  # Any sklearn regressor works
        symmetric=False,
    ),
    random_states=3,
)

ridge_ensemble.fit(X_ica)
print(f"Consistency score with Ridge: {ridge_ensemble.score(X_ica):.4f}")

Consistency score with Ridge: 1.0000

Summary#

Key takeaways:

ConsistencyEnsemble wraps any sklearn transformer to evaluate consistency
Choose the indeterminacy class based on your model’s theoretical identifiability:
- Permutation for ICA-like models
- Linear for PCA-like models
- Affine for models like CEBRA
Consistency scores close to 1.0 indicate reproducible representations
Use transform() to access aligned embeddings and detailed pairwise scores

Next, check out the Backends tutorial to learn about caching and distributed computation.