🐟iTuna

🐟iTuna#

Tune machine learning models for empirical identifiability and consistency

Why 🐟iTuna?#

Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.

Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.

🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:

Train multiple instances of a model with different random seeds
Align their embeddings under the appropriate indeterminacy class
Measure how consistent the learned representations are

Think of it as a unit test for reproducibility of learned embeddings.

Features#

sklearn-compatible: Works with any transformer implementing fit, transform, and standard sklearn conventions
Built-in indeterminacy classes:
- Identity - no transformation needed (model is already fully identifiable)
- Permutation - handles sign flips and component reordering (e.g., FastICA)
- Linear - linear transformation alignment (e.g., PCA)
- Affine - linear transformation with intercept (e.g., CEBRA)
Consistency scoring: Quantifies how stable embeddings are across runs
Embedding alignment: Returns aligned embeddings for downstream analysis
Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support

Installation#

pip install git+https://github.com/dynamical-inference/ituna.git

Optional extras:

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]"  # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]"        # Development dependencies (pytest, etc.)

Quickstart#

import numpy as np
from sklearn.decomposition import FastICA

from ituna import ConsistencyEnsemble, metrics

# Generate sample data
X = np.random.randn(1000, 64)

# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),  # FastICA is identifiable up to permutation
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,  # Train 5 instances with different seeds
)

# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))

# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)

Documentation#

Full documentation is available at dynamical-inference.github.io/ituna.

Quickstart notebook: docs/tutorials/quickstart.ipynb - minimal working example
Core concepts: docs/tutorials/core.ipynb - in-depth walkthrough
Backends: docs/tutorials/backends.ipynb - caching and distributed execution

Backends#

🐟iTuna supports different backends for caching and distributed computation:

from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA

ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),
    ),
    random_states=10,
)

# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
    ensemble.fit(X)

# Distributed execution with multiple workers
with config.config_context(
    DEFAULT_BACKEND="disk_cache_distributed",
    BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
    ensemble.fit(X)

CLI Commands#

For large-scale experiments, use the command-line tools:

# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache

# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema

Development#

# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]

# Run tests
pytest tests -v

# Setup pre-commit hooks
pre-commit install

For the full development guide — branching conventions, code style, building docs, and the release process — see CONTRIBUTING.md.

Citation#

If you use 🐟iTuna in your research, please cite:

@software{ituna,
  author = {Schmidt, Tobias and Schneider, Steffen},
  title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
  url = {https://github.com/dynamical-inference/ituna},
  version = {0.1.0},
}

License#

🐟iTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.