Skip to main content
A DataFit has two coupled pieces:
  • Objectives (iws.objectives.*) — what experiments to compare model output against.
  • Cost (iws.costs.*) — how the per-point disagreements are aggregated into a single number.
For the math behind each cost, see the Objective Functions Guide.

Available cost functions

SchemaFormulaWhen to use
iws.costs.SSE()iri2\sum_i r_i^2Default; works with every optimiser
iws.costs.MSE()1Niri2\frac{1}{N}\sum_i r_i^2Scale-aware mean of squared residuals
iws.costs.RMSE()1Niri2\sqrt{\frac{1}{N}\sum_i r_i^2}Interpretable units; scalar-only (won’t work with residual-array optimisers)
iws.costs.MAE()1Niri\frac{1}{N}\sum_i \lvert r_i \rvertRobust to outliers
iws.costs.Max()maxiri\max_i \lvert r_i \rvertMinimise the worst-case (largest absolute) residual
iws.costs.Wasserstein()1Niy~model,iy~data,i\frac{1}{N}\sum_i \lvert \tilde y_{\text{model},i} - \tilde y_{\text{data},i} \rvertMatch distributions (sorted samples) rather than point-wise time series. Set position_variable and weight_variable for weighted point-cloud mode
For MLE, see iws.costs.GaussianLogLikelihood — it accepts per-variable noise standard deviations or can estimate them alongside the fitting parameters. It produces a Gaussian negative log-likelihood suitable for Bayesian and MAP estimation.

Wiring a cost into a fit

import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "1C": iws.objectives.CurrentDriven(
            data_input="file:.../1C.csv",
            options={"model": {"type": "SPMe"}},
        ),
    },
    parameters={
        "Negative particle diffusivity [m2.s-1]": iws.Parameter(
            "Negative particle diffusivity [m2.s-1]",
            initial_value=2e-14,
            bounds=(1e-14, 1e-13),
        ),
    },
    cost=iws.costs.RMSE(),
)
If cost is omitted, the optimizer’s default cost function is used (typically a least-squares form).

Wasserstein weighted point-cloud mode

By default iws.costs.Wasserstein() compares the model and data samples for each objective variable with uniform weights (sorted point-wise comparison). Set both position_variable and weight_variable to switch to weighted point-cloud mode: one variable supplies the positions, the other supplies the (sign-stripped, renormalised) weights, and a single Wasserstein-1 distance is computed per objective. Use this when you want to match a density by position rather than sample-by-sample values — for example, lining up dQ/dV peaks in voltage rather than penalising every dQ/dV residual.
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "ocp": iws.objectives.MSMRFullCell(
            data_input="file:.../ocp.csv",
            options={
                "model": {"type": "MSMR"},
                "objective variables": [
                    "Differential capacity [Ah/V]",
                    "Voltage [V] (dQdU)",
                ],
            },
        ),
    },
    parameters={...},
    cost=iws.costs.Wasserstein(
        position_variable="Voltage [V] (dQdU)",
        weight_variable="Differential capacity [Ah/V]",
    ),
)
position_variable and weight_variable must be set together — providing only one raises a validation error. Weights are taken as absolute values and renormalised internally, so sign conventions on dQ/dV don’t matter. Residual-array output is not available in this mode.

Scoping a cost with calculation_structure

By default every cost on a DataFit consumes every objective and every objective variable in the outputs. Set calculation_structure on a cost to scope it explicitly: a mapping from objective name to the list of variable names that cost should compute, or None to compute all of that objective’s variables (an empty list computes none). Objectives you leave out of the mapping are not dropped. Inside a DataFit each unscoped objective is bound to all of its variables — the same as mapping it to None — so scoping one objective (e.g. {"ocp": ["Voltage [V]"]} while a "cc" objective also exists) still computes "cc" in full. Use this when one cost should only see a subset of variables — most commonly when you pair a per-variable cost (e.g. SSE) with a weighted Wasserstein. The Wasserstein owns the dQ/dV variables (whose model and data sides may have different lengths by construction), and the SSE is scoped to skip them so the lengths never collide.
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "ocp": iws.objectives.ElectrodeBalancing(
            data_input="file:.../ocp.csv",
            options={
                "objective variables": [
                    "Voltage [V]",
                    "Differential capacity [Ah/V] (model axis)",
                    "Voltage [V] (model axis)",
                ],
                "dQdU model axis": True,
            },
        ),
    },
    parameters={...},
    cost=[
        iws.costs.SSE(
            calculation_structure={"ocp": ["Voltage [V]"]},
        ),
        iws.costs.Wasserstein(
            position_variable="Voltage [V] (model axis)",
            weight_variable="Differential capacity [Ah/V] (model axis)",
            calculation_structure={
                "ocp": [
                    "Voltage [V] (model axis)",
                    "Differential capacity [Ah/V] (model axis)",
                ],
            },
        ),
    ],
)
calculation_structure replaces the deprecated objective_names field (a flat list of objective names with no per-variable control). Specifying both on the same cost raises a validation error.

Length-mismatch warning

Element-wise costs (SSE, MSE, RMSE, MAE, Max) combine the model and data arrays point-by-point, so a variable whose model and data sides have different lengths almost never gives a meaningful score. At fit setup, DataFit checks the shapes of every variable each cost is configured to score and emits a UserWarning for each mismatch — for example:
UserWarning: variable 'Voltage [V] (model axis)' of objective 'ocp' has mismatched
model/data shapes ((512,) vs (128,)). An element-wise cost will combine them
point-by-point, which is almost never intended. Scope the cost with an explicit
`calculation_structure` so each variable is compared against a matching-length
counterpart.
The check runs once at fit setup (not on every objective evaluation), so it has no impact on fit performance. When you see this warning, scope the cost with calculation_structure so it only sees variables whose model and data lengths match — and route any model-axis variables to a Wasserstein cost (or another distribution metric) instead. Distribution costs like Wasserstein are skipped by the check, since unequal-length sample sets are expected there.

Aligning dQ/dV peaks on the model voltage axis

iws.objectives.ElectrodeBalancing can emit dQ/dV on the model’s own full-window voltage axis in addition to (or instead of) the data voltage grid. Set dQdU model axis: True in options and add the two model-axis variables — "Differential capacity [Ah/V] (model axis)" and "Voltage [V] (model axis)" — to objective variables. Use this when you want a weighted cost (typically Wasserstein in point-cloud mode) to position-shift — i.e. align dQ/dV peaks in voltage rather than residual-by-residual on the data grid. The model and data sides have different lengths by construction, so only a weighted cost should consume them; pair them with a sibling per-variable cost scoped via calculation_structure (see above) to keep the rest of the fit honest. The existing data-axis variables ("Differential capacity [Ah/V]" plus the masked siblings "Voltage [V] (dQdU)" / "Capacity [A.h] (dQdU)") remain available — both axes can be requested side by side.

Available objectives

SchemaUse for
iws.objectives.CurrentDriven(data_input=..., options={...})Time-series voltage vs. current loads (drive cycles, custom loads)
iws.objectives.Pulse(data_input=..., options={...})Pulse experiments — GITT, HPPC, ICI — with optional feature-extraction variants
iws.objectives.OCPHalfCell(electrode=..., data_input=...)Half-cell OCP curves
iws.objectives.MSMRHalfCell(...)Fit MSMR parameters to half-cell data
iws.objectives.MSMRFullCell(...)Fit MSMR parameters to full-cell data. Supports Differential voltage [V/Ah] and Differential capacity [Ah/V] as objective variables
iws.objectives.ElectrodeBalancing(...)Stoichiometry windows from full-cell discharge
iws.objectives.EIS(...)Electrochemical impedance spectra
iws.objectives.Resistance(...)DC resistance extracted from pulse data
iws.objectives.CalendarAgeing(...) / iws.objectives.CycleAgeing(...)Ageing curves
Combine several by passing a dict[str, objective] to DataFit.objectives.

Specifying data_input

Every objective’s data_input (and any other data field on a calculation or interpolant) accepts the same set of forms:
  • A reference string: "db:<id>" to reference an uploaded measurement (the form used when you submit a fit to Ionworks). "file:..." and "folder:..." resolve against the local filesystem and only work when you run the pipeline locally with ionworkspipeline.
  • An ionworksdata.DataLoader (local or fetched with DataLoader.from_db(...)).
  • A bare pandas or polars DataFrame of pre-loaded columns.
import ionworks_schema as iws
import pandas as pd

df = pd.DataFrame(
    {
        "Time [s]": [...],
        "Voltage [V]": [...],
        "Current [A]": [...],
    }
)

obj = iws.objectives.CurrentDriven(
    data_input=df,
    options={"model": {"type": "SPMe"}},
)
When a bare DataFrame is passed, it is auto-wrapped on serialization to match the parser’s expected {"data": <columns>} shape — so data_input=df and data_input={"data": df} behave the same. String paths and already-wrapped dicts are left untouched.
Inline DataFrames are capped at 1,000 rows per call. For larger datasets, upload as a measurement and reference it by ID instead. See inline time series size limit.

Generating a CycleAgeing experiment from data

iws.objectives.CycleAgeing normally requires an explicit pybamm.Experiment describing the cycling protocol. When the protocol is already encoded in the cycler step information attached to your data, set experiment="from data" to skip rebuilding it by hand. The experiment is generated lazily, when the fit starts, by calling DataLoader.generate_experiment() on the loaded step table. Use this when:
  • The fitted data carries its own step information (a local ionworksdata.DataLoader, or one fetched with DataLoader.from_db(...)).
  • You want the simulated protocol to track the measurement protocol exactly — including any per-step current, voltage limits, or durations recorded by the cycler.
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "ageing": iws.objectives.CycleAgeing(
            data_input="db:<measurement-id>",
            options={
                "model": {"type": "SPM", "options": {"SEI": "ec reaction limited"}},
                "experiment": "from data",
                "objective variables": ["LLI [%]"],
            },
        ),
    },
    parameters={...},
)
If the data you are fitting against (for example, a per-cycle summary table) is a different object from the measurement that defines the protocol, pass a separate DataLoader as experiment instead — the steps come from that loader, while the residuals are still computed against data_input:
import ionworksdata as iwdata
import ionworks_schema as iws

protocol = iwdata.DataLoader.from_db("<protocol-measurement-id>")

fit = iws.DataFit(
    objectives={
        "ageing": iws.objectives.CycleAgeing(
            data_input="db:<summary-measurement-id>",
            options={
                "model": {"type": "SPM"},
                "experiment": protocol,
                "objective variables": ["LLI [%]"],
            },
        ),
    },
    parameters={...},
)
experiment="from data" requires data_input to resolve to a DataLoader (or a dict whose "data" entry is a DataLoader) that carries step information. When you pass a separate DataLoader as experiment, that loader must carry the step information instead. Either way, configurations missing steps fail fast at objective construction with a clear error, before any simulation runs.

Tuning the auto-built solver

Simulation-backed objectives (CurrentDriven, Pulse, CalendarAgeing, CycleAgeing, MSMRFullCell, …) build an IonworksSolver for you when no explicit solver is provided. Pass solver_kwargs inside simulation_kwargs to override individual pieces of that default without restating the rest:
  • Nested options are merged over the default IDAKLU options. For example, {"options": {"compile": True}} flips on model compilation but keeps every other tuned option.
  • Other top-level keys (atol, rtol, on_extrapolation, …) override the corresponding default solver kwargs.
solver_kwargs is ignored (with a warning) when an explicit solver is supplied — configure those on the solver instance directly. It is also ignored when the model’s default solver isn’t IDAKLU-based.
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "1C": iws.objectives.CurrentDriven(
            data_input="file:.../1C.csv",
            options={
                "model": {"type": "SPMe"},
                "simulation_kwargs": {
                    "solver_kwargs": {
                        "options": {"compile": True},
                        "atol": 1e-8,
                    },
                },
            },
        ),
    },
    parameters={...},
)
Enabling compile ahead of time ({"options": {"compile": True}}) trades a one-off compilation cost for faster repeated evaluations — useful when the same objective is solved many times during a fit or sweep.

Forwarding kwargs to the runtime solve

simulation_kwargs also accepts solve_kwargs, a dict forwarded to the runtime sim.solve(...) call on every objective evaluation. Use it for arguments that belong on the solve itself rather than the solver — for example starting_solution to warm-start from a previous solution, or any other pybamm.Simulation.solve argument.
  • solve_kwargs is applied regardless of whether the objective auto-built the solver or you supplied an explicit solver. It is the recommended way to pass solve-time arguments that work with any solver.
  • solver_kwargs (above) tunes the auto-built solver at construction time; solve_kwargs configures each solve call. The two are independent and can be combined.
  • Keys the objective controls directly — inputs, initial_soc, t_eval, t_interp, frequencies — are reserved and raise a ValueError if passed via solve_kwargs.
  • For CycleAgeing, save_at_cycles is derived automatically from the metrics; any value passed via solve_kwargs is ignored with a warning so that the cycles required by the metrics are preserved.
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "pulse": iws.objectives.Pulse(
            data_input="file:.../pulse.csv",
            options={
                "model": {"type": "SPMe"},
                "simulation_kwargs": {
                    # Tunes the auto-built solver (construction time):
                    "solver_kwargs": {"options": {"compile": True}},
                    # Forwarded to every sim.solve(...) call (runtime).
                    # prior_solution is a pybamm.Solution you obtained from an
                    # earlier sim.solve(...) — substitute your own:
                    "solve_kwargs": {"starting_solution": prior_solution},
                },
            },
        ),
    },
    parameters={...},
)

CycleAgeing: automatic store_first_last for first/last-only metrics

CycleAgeing lets you supply metrics — a mapping from each objective variable to a .by_cycle() metric that pulls the value of interest out of the simulation. Defaults are provided for "LLI [%]", "LAM_ne [%]", and "LAM_pe [%]", all of which read a single per-step sample. When every metric in that mapping reads only the first or last sample of a step — i.e. the defaults, or any First/Last .by_cycle() metric — CycleAgeing now defaults solver_kwargs["store_first_last"] to True. The solver then stores only the endpoints of each step, which is far more memory-light for long cycling solves and produces identical results for these metrics. The flag is only auto-set when it is safe to do so:
  • Metrics that read interior points (e.g. Mean(...).by_cycle()) leave the default off so no samples are dropped.
  • Composed metrics (arithmetic of First/Last) are conservatively left alone.
  • An explicit store_first_last in solver_kwargs is always respected.
  • Supplying your own solver skips solver-kwargs injection entirely (as elsewhere).
import ionworks_schema as iws

fit = iws.DataFit(
    objectives={
        "ageing": iws.objectives.CycleAgeing(
            data_input="file:.../ageing.csv",
            options={
                "model": {"type": "SPM"},
                "experiment": "from data",
                "objective variables": ["LLI [%]", "LAM_ne [%]"],
                # Defaults already read first/last only, so store_first_last
                # is enabled automatically. Override explicitly when needed:
                # "simulation_kwargs": {
                #     "solver_kwargs": {"store_first_last": False},
                # },
            },
        ),
    },
    parameters={...},
)
For most optimisers, SSE is the safest choice — it has both a residual-array form and a scalar form, so it’s compatible with every algorithm. Use MSE or RMSE when you need scale-independent reporting.

Objective Functions (theory)

Residual vs. canonical form, MLE interpretation.

Data Fitting overview

Putting objectives, parameters, and optimisers together.