iws.DataFit describes a parameter fit: which experiments to compare against, which parameters are free, and how to search. The schema is submitted as one element of a pipeline. For the theory (cost functions, identifiability, multi-start), see the Data Fitting Guide.
A minimal fit
Configuration mistakes inside a
DataFit (bad parameter names, malformed objectives, …) surface as UserConfigurationError. The job classifier maps these to a Configuration error in Studio so they’re easy to distinguish from solver failures.Multiple objectives
Pass multiple objectives to fit against several experiments simultaneously (e.g. discharge at different C-rates or temperatures):Optimizers
iws.optimizers exposes the optimisers available to DataFit. Pick the one that fits your problem:
| Schema | Best for |
|---|---|
iws.optimizers.ScipyMinimize(method="L-BFGS-B") | Smooth problems, fast local optimisation |
iws.optimizers.ScipyLeastSquares() | Residual-based least-squares; good with priors |
iws.optimizers.DifferentialEvolution() | Global, no gradients required |
iws.optimizers.CMAES() | Global, many local minima, well-tested defaults |
iws.optimizers.PSO() | Global, parallelisable population search |
iws.optimizers.BayesianOptimization() | Expensive evaluations, ≤ ~10 parameters, small budget |
iws.optimizers.TuRBO() | Expensive evaluations run in parallel batches; higher-dimensional problems |
iws.optimizers.SOBER() | Wide parallel batches using quadrature-style recombination |
The surrogate optimisers (They are imported lazily, so installs that only use the population-based or SciPy optimisers do not pay this dependency cost.
BayesianOptimization, TuRBO, SOBER) require the optional surrogate install extra, which adds torch, botorch, and gpytorch:TuRBO for expensive parallel problems
When each evaluation is expensive and you have workers to spare,TuRBO proposes a batch of candidates per round and adapts a trust region around the current best point. Match the warm-up size to the parallel batch width so the first round fully uses the available workers:
algorithm_options keys for surrogate optimisers include n_initial (warm-up sample count), noise_floor ("low", "standard", or a (lo, hi) interval), and — for TuRBO — trust-region controls (tr_length_init, tr_length_min, tr_length_max, tr_success_tolerance, tr_failure_tolerance, n_candidates).
Strict option validation
algorithm_options is validated against the optimiser you chose. Unknown keys — including typos — are rejected at submission time rather than silently ignored, so misconfigured fits fail fast instead of running with default behaviour:
CMAESOptions, PSOOptions, DEOptions, XNESOptions — for the XNES optimizer, available via iws.optimizers.XNES() / AskTellOptimizer(method="XNES") — BayesianOptimizationOptions, SOBEROptions, TuRBOOptions). Prefer the typed wrappers for editor autocomplete and inline documentation of each option. The only exception is CMAESOptions, which remains a passthrough to pycma’s own option surface.
No SciPy-style kwargs on native optimizers
Native ask/tell optimizers (CMAES, DifferentialEvolution, PSO, XNES, BayesianOptimization, TuRBO, SOBER, and the underlying AskTellOptimizer) also reject unknown top-level keyword arguments at construction. SciPy-style keys such as maxiter, popsize, seed, and tol are not accepted — they previously had no effect and now fail validation immediately, so misconfigured fits surface at construction rather than silently:
ScipyMinimize, ScipyLeastSquares, ScipyDifferentialEvolution), which forward them directly to the underlying SciPy call:
max_iterations, population_size, population_convergence_tol); put algorithm internals in algorithm_options; and keep SciPy keywords on the Scipy* optimizers.
Multi-start
For problems with multiple local minima, run several optimisations from different starting points:Runtime options
iws.DataFit accepts an options dict that tunes the optimisation loop without changing the schema. All keys are optional.
| Key | Default | Description |
|---|---|---|
seed | None | Random seed for reproducible multi-start initial guesses and stochastic optimisers. |
low_memory | False | Drop log entries that don’t improve the best cost by ≥0.1%. Useful for long runs with many iterations. |
max_iterations | None | Per-job iteration cap. Only applies when the model uses convert_to_format == 'casadi'. |
maxtime | None | Per-job wall-time budget in seconds. With multi-start the total may exceed this since many jobs run. |
skip_objective_callbacks | False locally / True on the cluster | Skip the per-objective callbacks that simulate the model at the initial guess and at the fitted parameters. Improves performance but leaves the initial/final fit results unpopulated. |
Pipelines submitted to the Ionworks cluster enable
skip_objective_callbacks by default to reduce simulation cost. Set it explicitly to False in options if you need the initial- and final-fit simulation results returned with the run.Retrieving results
result.element_results["fit"] is a dict keyed by the data-fit’s outputs (best parameter values, final cost, and any logged trajectories). See packages/ionworks-api/examples/pipeline/datafit.py for an end-to-end example.
Data Fitting (theory)
Cost-function math, identifiability, multi-start strategy.
Objective Functions
Pick the right cost for your data shape.
Regularization
Stabilise fits with Gaussian priors.
Sensitivity Analysis
Quantify which parameters the fit actually constrains.