DataFit has two coupled pieces:
- Objectives (
iws.objectives.*) — what experiments to compare model output against. - Cost (
iws.costs.*) — how the per-point disagreements are aggregated into a single number.
Available cost functions
| Schema | Formula | When to use |
|---|---|---|
iws.costs.SSE() | Default; works with every optimiser | |
iws.costs.MSE() | Scale-aware mean of squared residuals | |
iws.costs.RMSE() | Interpretable units; scalar-only (won’t work with residual-array optimisers) | |
iws.costs.MAE() | Robust to outliers | |
iws.costs.Max() | Minimise the worst-case (largest absolute) residual | |
iws.costs.Wasserstein() | Match distributions (sorted samples) rather than point-wise time series. Set position_variable and weight_variable for weighted point-cloud mode |
iws.costs.GaussianLogLikelihood — it accepts per-variable noise standard deviations or can estimate them alongside the fitting parameters. It produces a Gaussian negative log-likelihood suitable for Bayesian and MAP estimation.
Wiring a cost into a fit
cost is omitted, the optimizer’s default cost function is used (typically a least-squares form).
Wasserstein weighted point-cloud mode
By defaultiws.costs.Wasserstein() compares the model and data samples for each objective variable with uniform weights (sorted point-wise comparison). Set both position_variable and weight_variable to switch to weighted point-cloud mode: one variable supplies the positions, the other supplies the (sign-stripped, renormalised) weights, and a single Wasserstein-1 distance is computed per objective.
Use this when you want to match a density by position rather than sample-by-sample values — for example, lining up dQ/dV peaks in voltage rather than penalising every dQ/dV residual.
position_variable and weight_variable must be set together — providing only one raises a validation error. Weights are taken as absolute values and renormalised internally, so sign conventions on dQ/dV don’t matter. Residual-array output is not available in this mode.Scoping a cost with calculation_structure
By default every cost on a DataFit consumes every objective and every objective variable in the outputs. Set calculation_structure on a cost to scope it explicitly: a mapping from objective name to the list of variable names that cost should compute, or None to compute all of that objective’s variables (an empty list computes none).
Objectives you leave out of the mapping are not dropped. Inside a DataFit each unscoped objective is bound to all of its variables — the same as mapping it to None — so scoping one objective (e.g. {"ocp": ["Voltage [V]"]} while a "cc" objective also exists) still computes "cc" in full.
Use this when one cost should only see a subset of variables — most commonly when you pair a per-variable cost (e.g. SSE) with a weighted Wasserstein. The Wasserstein owns the dQ/dV variables (whose model and data sides may have different lengths by construction), and the SSE is scoped to skip them so the lengths never collide.
calculation_structure replaces the deprecated objective_names field (a flat list of objective names with no per-variable control). Specifying both on the same cost raises a validation error.Length-mismatch warning
Element-wise costs (SSE, MSE, RMSE, MAE, Max) combine the model and data arrays point-by-point, so a variable whose model and data sides have different lengths almost never gives a meaningful score. At fit setup, DataFit checks the shapes of every variable each cost is configured to score and emits a UserWarning for each mismatch — for example:
calculation_structure so it only sees variables whose model and data lengths match — and route any model-axis variables to a Wasserstein cost (or another distribution metric) instead. Distribution costs like Wasserstein are skipped by the check, since unequal-length sample sets are expected there.
Aligning dQ/dV peaks on the model voltage axis
iws.objectives.ElectrodeBalancing can emit dQ/dV on the model’s own full-window voltage axis in addition to (or instead of) the data voltage grid. Set dQdU model axis: True in options and add the two model-axis variables — "Differential capacity [Ah/V] (model axis)" and "Voltage [V] (model axis)" — to objective variables.
Use this when you want a weighted cost (typically Wasserstein in point-cloud mode) to position-shift — i.e. align dQ/dV peaks in voltage rather than residual-by-residual on the data grid. The model and data sides have different lengths by construction, so only a weighted cost should consume them; pair them with a sibling per-variable cost scoped via calculation_structure (see above) to keep the rest of the fit honest.
The existing data-axis variables ("Differential capacity [Ah/V]" plus the masked siblings "Voltage [V] (dQdU)" / "Capacity [A.h] (dQdU)") remain available — both axes can be requested side by side.
Available objectives
| Schema | Use for |
|---|---|
iws.objectives.CurrentDriven(data_input=..., options={...}) | Time-series voltage vs. current loads (drive cycles, custom loads) |
iws.objectives.Pulse(data_input=..., options={...}) | Pulse experiments — GITT, HPPC, ICI — with optional feature-extraction variants |
iws.objectives.OCPHalfCell(electrode=..., data_input=...) | Half-cell OCP curves |
iws.objectives.MSMRHalfCell(...) | Fit MSMR parameters to half-cell data |
iws.objectives.MSMRFullCell(...) | Fit MSMR parameters to full-cell data. Supports Differential voltage [V/Ah] and Differential capacity [Ah/V] as objective variables |
iws.objectives.ElectrodeBalancing(...) | Stoichiometry windows from full-cell discharge |
iws.objectives.EIS(...) | Electrochemical impedance spectra |
iws.objectives.Resistance(...) | DC resistance extracted from pulse data |
iws.objectives.CalendarAgeing(...) / iws.objectives.CycleAgeing(...) | Ageing curves |
dict[str, objective] to DataFit.objectives.
Specifying data_input
Every objective’s data_input (and any other data field on a calculation or interpolant) accepts the same set of forms:
- A reference string:
"db:<id>"to reference an uploaded measurement (the form used when you submit a fit to Ionworks)."file:..."and"folder:..."resolve against the local filesystem and only work when you run the pipeline locally withionworkspipeline. - An
ionworksdata.DataLoader(local or fetched withDataLoader.from_db(...)). - A bare pandas or polars
DataFrameof pre-loaded columns.
DataFrame is passed, it is auto-wrapped on serialization to match the parser’s expected {"data": <columns>} shape — so data_input=df and data_input={"data": df} behave the same. String paths and already-wrapped dicts are left untouched.
Inline DataFrames are capped at 1,000 rows per call. For larger datasets, upload as a measurement and reference it by ID instead. See inline time series size limit.
Generating a CycleAgeing experiment from data
iws.objectives.CycleAgeing normally requires an explicit pybamm.Experiment describing the cycling protocol. When the protocol is already encoded in the cycler step information attached to your data, set experiment="from data" to skip rebuilding it by hand. The experiment is generated lazily, when the fit starts, by calling DataLoader.generate_experiment() on the loaded step table.
Use this when:
- The fitted data carries its own step information (a local
ionworksdata.DataLoader, or one fetched withDataLoader.from_db(...)). - You want the simulated protocol to track the measurement protocol exactly — including any per-step current, voltage limits, or durations recorded by the cycler.
DataLoader as experiment instead — the steps come from that loader, while the residuals are still computed against data_input:
experiment="from data" requires data_input to resolve to a DataLoader (or a dict whose "data" entry is a DataLoader) that carries step information. When you pass a separate DataLoader as experiment, that loader must carry the step information instead. Either way, configurations missing steps fail fast at objective construction with a clear error, before any simulation runs.Tuning the auto-built solver
Simulation-backed objectives (CurrentDriven, Pulse, CalendarAgeing, CycleAgeing, MSMRFullCell, …) build an IonworksSolver for you when no explicit solver is provided. Pass solver_kwargs inside simulation_kwargs to override individual pieces of that default without restating the rest:
- Nested
optionsare merged over the default IDAKLU options. For example,{"options": {"compile": True}}flips on model compilation but keeps every other tuned option. - Other top-level keys (
atol,rtol,on_extrapolation, …) override the corresponding default solver kwargs.
solver_kwargs is ignored (with a warning) when an explicit solver is supplied — configure those on the solver instance directly. It is also ignored when the model’s default solver isn’t IDAKLU-based.
Forwarding kwargs to the runtime solve
simulation_kwargs also accepts solve_kwargs, a dict forwarded to the runtime sim.solve(...) call on every objective evaluation. Use it for arguments that belong on the solve itself rather than the solver — for example starting_solution to warm-start from a previous solution, or any other pybamm.Simulation.solve argument.
solve_kwargsis applied regardless of whether the objective auto-built the solver or you supplied an explicitsolver. It is the recommended way to pass solve-time arguments that work with any solver.solver_kwargs(above) tunes the auto-built solver at construction time;solve_kwargsconfigures each solve call. The two are independent and can be combined.- Keys the objective controls directly —
inputs,initial_soc,t_eval,t_interp,frequencies— are reserved and raise aValueErrorif passed viasolve_kwargs. - For
CycleAgeing,save_at_cyclesis derived automatically from the metrics; any value passed viasolve_kwargsis ignored with a warning so that the cycles required by the metrics are preserved.
CycleAgeing: automatic store_first_last for first/last-only metrics
CycleAgeing lets you supply metrics — a mapping from each objective variable to a .by_cycle() metric that pulls the value of interest out of the simulation. Defaults are provided for "LLI [%]", "LAM_ne [%]", and "LAM_pe [%]", all of which read a single per-step sample.
When every metric in that mapping reads only the first or last sample of a step — i.e. the defaults, or any First/Last .by_cycle() metric — CycleAgeing now defaults solver_kwargs["store_first_last"] to True. The solver then stores only the endpoints of each step, which is far more memory-light for long cycling solves and produces identical results for these metrics.
The flag is only auto-set when it is safe to do so:
- Metrics that read interior points (e.g.
Mean(...).by_cycle()) leave the default off so no samples are dropped. - Composed metrics (arithmetic of
First/Last) are conservatively left alone. - An explicit
store_first_lastinsolver_kwargsis always respected. - Supplying your own
solverskips solver-kwargs injection entirely (as elsewhere).
Objective Functions (theory)
Residual vs. canonical form, MLE interpretation.
Data Fitting overview
Putting objectives, parameters, and optimisers together.