Data Fitting Overview - Ionworks Studio

iws.DataFit describes a parameter fit: which experiments to compare against, which parameters are free, and how to search. The schema is submitted as one element of a pipeline. For the theory (cost functions, identifiability, multi-start), see the Data Fitting Guide.

A minimal fit

import ionworks_schema as iws
from ionworks import Ionworks

# Known parameters (everything not fit)
known = iws.direct_entries.DirectEntry(
    parameters={"Ambient temperature [K]": 298.15},
)

# Objective: compare a current-driven SPMe simulation against measured voltage
obj_1C = iws.objectives.CurrentDriven(
    data_input="file:examples/data/chen_synthetic_1C/time_series.csv",
    options={"model": {"type": "SPMe"}},
)

# Free parameters
parameters = {
    "Negative particle diffusivity [m2.s-1]": iws.Parameter(
        "Negative particle diffusivity [m2.s-1]",
        initial_value=2e-14,
        bounds=(1e-14, 1e-13),
    ),
    "Positive particle diffusivity [m2.s-1]": iws.Parameter(
        "Positive particle diffusivity [m2.s-1]",
        initial_value=2e-15,
        bounds=(1e-15, 1e-14),
    ),
}

fit = iws.DataFit(
    objectives={"test_1C": obj_1C},
    parameters=parameters,
    cost=iws.costs.SSE(),
    optimizer=iws.optimizers.DifferentialEvolution(),
)

pipeline = iws.Pipeline({"known": known, "fit": fit})

client = Ionworks()
submission = client.pipeline.create(pipeline)
client.pipeline.wait_for_completion(submission.id)
result = client.pipeline.result(submission.id)
print(result.element_results["fit"])

Configuration mistakes inside a DataFit (bad parameter names, malformed objectives, …) surface as UserConfigurationError. The job classifier maps these to a Configuration error in Studio so they’re easy to distinguish from solver failures.

Multiple objectives

Pass multiple objectives to fit against several experiments simultaneously (e.g. discharge at different C-rates or temperatures):

fit = iws.DataFit(
    objectives={
        "1C": iws.objectives.CurrentDriven(
            data_input="file:.../1C.csv",
            options={"model": {"type": "SPMe"}},
        ),
        "0.5C": iws.objectives.CurrentDriven(
            data_input="file:.../0.5C.csv",
            options={"model": {"type": "SPMe"}},
        ),
    },
    parameters=parameters,
)

Each objective contributes to a single combined cost.

Optimizers

iws.optimizers exposes the optimisers available to DataFit. Pick the one that fits your problem:

Schema	Best for
`iws.optimizers.ScipyMinimize(method="L-BFGS-B")`	Smooth problems, fast local optimisation
`iws.optimizers.ScipyLeastSquares()`	Residual-based least-squares; good with priors
`iws.optimizers.DifferentialEvolution()`	Global, no gradients required
`iws.optimizers.CMAES()`	Global, many local minima, well-tested defaults
`iws.optimizers.PSO()`	Global, parallelisable population search
`iws.optimizers.BayesianOptimization()`	Expensive evaluations, ≤ ~10 parameters, small budget
`iws.optimizers.TuRBO()`	Expensive evaluations run in parallel batches; higher-dimensional problems
`iws.optimizers.SOBER()`	Wide parallel batches using quadrature-style recombination

See Objective Functions for the cost-function options.

The surrogate optimisers (BayesianOptimization, TuRBO, SOBER) require the optional surrogate install extra, which adds torch, botorch, and gpytorch:

pip install "ionworkspipeline[surrogate]"

They are imported lazily, so installs that only use the population-based or SciPy optimisers do not pay this dependency cost.

TuRBO for expensive parallel problems

When each evaluation is expensive and you have workers to spare, TuRBO proposes a batch of candidates per round and adapts a trust region around the current best point. Match the warm-up size to the parallel batch width so the first round fully uses the available workers:

import ionworkspipeline as iws

fit = iws.DataFit(
    objectives=objectives,
    parameters=parameters,
    optimizer=iws.optimizers.TuRBO(
        max_iterations=12,
        population_size=64,
        algorithm_options={"noise_floor": "low", "n_initial": 64},
    ),
    parallel=True,
    num_workers=64,
)

Useful algorithm_options keys for surrogate optimisers include n_initial (warm-up sample count), noise_floor ("low", "standard", or a (lo, hi) interval), and — for TuRBO — trust-region controls (tr_length_init, tr_length_min, tr_length_max, tr_success_tolerance, tr_failure_tolerance, n_candidates).

Strict option validation

algorithm_options is validated against the optimiser you chose. Unknown keys — including typos — are rejected at submission time rather than silently ignored, so misconfigured fits fail fast instead of running with default behaviour:

# Raises a validation error: "noise_flor" is not a recognised TuRBO option.
iws.optimizers.TuRBO(algorithm_options={"noise_flor": "low"})

# AskTellOptimizer is the low-level class behind CMAES(), PSO(), XNES(), etc.;
# the named optimisers above are thin wrappers that set `method` for you.
# Raises a validation error: BO options passed to a CMAES fit.
iws.optimizers.AskTellOptimizer(
    method="CMAES",
    algorithm_options=iws.optimizers.BayesianOptimizationOptions(n_initial=32),
)

The same check applies whether you pass a raw dict or one of the typed wrappers (CMAESOptions, PSOOptions, DEOptions, XNESOptions — for the XNES optimizer, available via iws.optimizers.XNES() / AskTellOptimizer(method="XNES") — BayesianOptimizationOptions, SOBEROptions, TuRBOOptions). Prefer the typed wrappers for editor autocomplete and inline documentation of each option. The only exception is CMAESOptions, which remains a passthrough to pycma’s own option surface.

No SciPy-style kwargs on native optimizers

Native ask/tell optimizers (CMAES, DifferentialEvolution, PSO, XNES, BayesianOptimization, TuRBO, SOBER, and the underlying AskTellOptimizer) also reject unknown top-level keyword arguments at construction. SciPy-style keys such as maxiter, popsize, seed, and tol are not accepted — they previously had no effect and now fail validation immediately, so misconfigured fits surface at construction rather than silently:

# Raises a validation error — "Extra inputs are not permitted" for each unknown
# key (maxiter, popsize, tol).
iws.optimizers.DifferentialEvolution(maxiter=10, popsize=5, tol=1e-6)

# Correct: use the documented ask/tell parameters (tol -> population_convergence_tol).
iws.optimizers.DifferentialEvolution(
    max_iterations=10,
    population_size=5,
    population_convergence_tol=1e-6,
)

# Algorithm-specific settings go in `algorithm_options`.
iws.optimizers.CMAES(algorithm_options={"seed": 42})

SciPy-style keys still belong on the SciPy passthrough optimizers (ScipyMinimize, ScipyLeastSquares, ScipyDifferentialEvolution), which forward them directly to the underlying SciPy call:

# OK: ScipyDifferentialEvolution forwards maxiter/popsize/seed/tol to scipy.optimize.
iws.optimizers.ScipyDifferentialEvolution(maxiter=10, popsize=5, seed=0)

In short: put iteration, population, and tolerance limits in the named ask/tell arguments (max_iterations, population_size, population_convergence_tol); put algorithm internals in algorithm_options; and keep SciPy keywords on the Scipy* optimizers.

Multi-start

For problems with multiple local minima, run several optimisations from different starting points:

fit = iws.DataFit(
    objectives=objectives,
    parameters=parameters,
    multistarts=20,
)

The pipeline generates initial guesses (Latin Hypercube by default), runs them in parallel, and returns every result sorted by cost.

Runtime options

iws.DataFit accepts an options dict that tunes the optimisation loop without changing the schema. All keys are optional.

Key	Default	Description
`seed`	`None`	Random seed for reproducible multi-start initial guesses and stochastic optimisers.
`low_memory`	`False`	Drop log entries that don’t improve the best cost by ≥0.1%. Useful for long runs with many iterations.
`max_iterations`	`None`	Per-job iteration cap. Only applies when the model uses `convert_to_format == 'casadi'`.
`maxtime`	`None`	Per-job wall-time budget in seconds. With multi-start the total may exceed this since many jobs run.
`skip_objective_callbacks`	`False` locally / `True` on the cluster	Skip the per-objective callbacks that simulate the model at the initial guess and at the fitted parameters. Improves performance but leaves the initial/final fit results unpopulated.

fit = iws.DataFit(
    objectives=objectives,
    parameters=parameters,
    options={
        "seed": 42,
        "low_memory": True,
        "maxtime": 600,
        "skip_objective_callbacks": False,
    },
)

Pipelines submitted to the Ionworks cluster enable skip_objective_callbacks by default to reduce simulation cost. Set it explicitly to False in options if you need the initial- and final-fit simulation results returned with the run.

Retrieving results

client.pipeline.wait_for_completion(submission.id)
result = client.pipeline.result(submission.id)
print(result.element_results["fit"])

result.element_results["fit"] is a dict keyed by the data-fit’s outputs (best parameter values, final cost, and any logged trajectories). See packages/ionworks-api/examples/pipeline/datafit.py for an end-to-end example.

Data Fitting (theory)

Cost-function math, identifiability, multi-start strategy.

Objective Functions

Pick the right cost for your data shape.

Regularization

Stabilise fits with Gaussian priors.

Sensitivity Analysis

Quantify which parameters the fit actually constrains.

​A minimal fit

​Multiple objectives

​Optimizers

​TuRBO for expensive parallel problems

​Strict option validation

​No SciPy-style kwargs on native optimizers

​Multi-start

​Runtime options

​Retrieving results

Data Fitting (theory)

Objective Functions

Regularization

Sensitivity Analysis

A minimal fit

Multiple objectives

Optimizers

TuRBO for expensive parallel problems

Strict option validation

No SciPy-style kwargs on native optimizers

Multi-start

Runtime options

Retrieving results