Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ionworks.com/llms.txt

Use this file to discover all available pages before exploring further.

Regularization techniques improve the stability and generalization of parameter estimation by adding penalty terms to the objective function. When fitting battery models, regularization helps address common challenges: noisy data, correlated parameters, and limited experimental conditions that leave some parameters poorly constrained.

Why Regularization?

Standard least-squares fitting minimizes the error between model predictions and data. However, this can lead to problems:
  • Overfitting: The optimizer finds parameter values that match noise in the training data, leading to poor predictions on new data
  • Ill-conditioning: When parameters are correlated (e.g., electrode thickness and diffusivity both affect time constants), small data perturbations cause large parameter swings
  • Non-identifiability: Some parameters may not be uniquely determined by the available data
Regularization addresses these issues by penalizing extreme parameter values, effectively encoding prior knowledge that parameters should stay within reasonable ranges.

Ridge Regression

Ridge regression adds an L2 penalty (sum of squared parameter values) to the least-squares objective. This shrinks parameter estimates toward zero, reducing variance at the cost of introducing some bias.

Problem Formulation

The ridge regression objective is: xRR=argminxiNri(x)2+λjMxj2x_{\text{RR}}^* = \arg\min_x \sum_i^N r_i(x)^2 + \lambda \sum_j^M x_j^2 with residuals: r(x)=Axbr(x) = Ax - b where:
  • xRMx \in \mathbb{R}^M – vector of parameters to estimate
  • ARN×MA \in \mathbb{R}^{N \times M} – design matrix (model predictions as a function of parameters)
  • bRNb \in \mathbb{R}^N – observed data
  • λ[0,+)\lambda \in [0, +\infty) – regularization strength
The first term measures data fidelity (how well the model fits the data), while the second term penalizes large parameter values. The hyperparameter λ\lambda controls the tradeoff: larger λ\lambda means stronger regularization and more shrinkage toward zero.

Normalization Requirement

For the L2 penalty to treat all parameters equally, both the residuals and parameters must be on comparable scales. This is typically achieved by Z-scoring (standardizing to zero mean and unit variance): A^=Amean(A,axis=0)std(A,axis=0)\hat{A} = \frac{A - \text{mean}(A, \text{axis}=0)}{\text{std}(A, \text{axis}=0)} b^=bmean(b)std(b)\hat{b} = \frac{b - \text{mean}(b)}{\text{std}(b)} Without normalization, parameters with larger natural scales would be penalized more heavily, distorting the regularization.

Hyperparameter Optimization

The regularization strength λ\lambda is a hyperparameter that must be chosen carefully. Too little regularization leaves the model prone to overfitting; too much regularization forces parameters away from their data-driven values, introducing bias. The goal is to find the λ\lambda that best balances these competing effects.

Bias-Variance Tradeoff

Regularization introduces a fundamental tradeoff between bias and variance:
  • Bias: Regularization shrinks parameters toward the prior, pulling estimates away from the “true” values. This is the cost of regularization.
  • Variance: Without regularization, estimates are highly sensitive to noise in the training data. Regularization reduces this sensitivity.
The optimal λ\lambda minimizes the total error (bias² + variance) on unseen data:
Bias-variance tradeoff
λ\lambda ValueTraining ErrorValidation ErrorIssue
Too small (λ0\lambda \to 0)LowHighOverfitting
Too largeHighHighUnderfitting
Optimal (λ\lambda^*)ModerateLowBest generalization

Optimization Procedure

1

Fit on Training Data

For a fixed value of λ\lambda, determine xRRx_{\text{RR}}^* using the training data
2

Evaluate Validation Error

Compute the prediction error on the validation set
3

Repeat for Multiple λ Values

Iterate steps 1-2 for several λ\lambda values
4

Select Optimal λ

Choose λ\lambda^* that minimizes validation error, then refit on combined training and validation data

Maximum A Posteriori (MAP) Estimation

While ridge regression shrinks parameters toward zero, we often have better prior knowledge—for example, literature values or physical constraints. MAP estimation with Gaussian priors generalizes ridge regression by shrinking parameters toward specified prior means rather than zero. From a Bayesian perspective, MAP estimation finds the parameter values that maximize the posterior probability given the data. With Gaussian priors and Gaussian measurement noise, this is equivalent to minimizing: xMAP=argminxiN(y^i(x)yiσy,i)2+jM(xjμjσx,j)2x_{\text{MAP}}^* = \arg\min_x \sum_i^N \left(\frac{\hat{y}_i(x) - y_i}{\sigma_{y,i}}\right)^2 + \sum_j^M \left(\frac{x_j - \mu_j}{\sigma_{x,j}}\right)^2 where:
  • y^i(x)\hat{y}_i(x) – model prediction at data point ii
  • yiy_i – observed data at point ii
  • σy,i\sigma_{y,i} – measurement uncertainty (standard deviation)
  • μj\mu_j – prior mean for parameter jj (e.g., literature value)
  • σx,j\sigma_{x,j} – prior uncertainty for parameter jj
The first term is the normalized data misfit (chi-squared statistic). The second term penalizes deviations from prior expectations, weighted by prior uncertainty. Parameters with tight priors (small σx,j\sigma_{x,j}) are constrained more strongly.

Connection to Ridge Regression

MAP estimation is mathematically equivalent to ridge regression when parameters are centered at the prior mean and scaled by the prior standard deviation. Adding a regularization hyperparameter λ\lambda gives: xMAP,RR=argminxiN(y^i(x)yiσy,i)2+λjM(xjμjσx,j)2x_{\text{MAP,RR}}^* = \arg\min_x \sum_i^N \left(\frac{\hat{y}_i(x) - y_i}{\sigma_{y,i}}\right)^2 + \lambda \sum_j^M \left(\frac{x_j - \mu_j}{\sigma_{x,j}}\right)^2 When λ=1\lambda = 1, this is standard MAP estimation. When λ<1\lambda < 1, the data is weighted more heavily relative to the priors. When λ>1\lambda > 1, the priors dominate.

Efficient Nonlinear Regularization

For linear models, ridge regression has an analytic solution. Nonlinear models (like battery electrochemical models) require iterative optimization, and finding the optimal λ\lambda through cross-validation would require repeated refitting—computationally expensive. An efficient alternative leverages two key assumptions:
  1. All parameters have priors: Every parameter has a specified prior distribution, eliminating identifiability issues where multiple parameter combinations give equivalent fits.
  2. Local quadratic approximation: Near the optimum xMAPx_{\text{MAP}}^*, the objective function is approximately quadratic. This is valid when optimization has converged to a well-defined minimum.
Under these assumptions, the Hessian at the optimum characterizes the local curvature, and the optimal λ\lambda^* can be determined efficiently from a single optimization run plus validation error evaluation—without repeatedly refitting the full model.

Practical Usage

To use regularization in ionworkspipeline, attach Gaussian priors to your parameters. The prior mean represents your best estimate before seeing data, and the prior standard deviation encodes your uncertainty.
import ionworkspipeline as iwp

# Define parameters with priors
parameters = {
    "Positive particle diffusivity [m2.s-1]": iwp.Parameter(
        "D_pos",
        initial_value=1e-14,
        bounds=(1e-16, 1e-12),
        prior=iwp.priors.Gaussian(mean=1e-14, std=5e-15)
    ),
    "Negative particle diffusivity [m2.s-1]": iwp.Parameter(
        "D_neg",
        initial_value=3e-14,
        bounds=(1e-16, 1e-12),
        prior=iwp.priors.Gaussian(mean=3e-14, std=1e-14)
    ),
}

# Run with regularization
datafit = iwp.DataFit(
    objectives=objective,
    parameters=parameters,
    optimizer=iwp.optimizers.ScipyLeastSquares(),
)
result = datafit.run(fixed_parameters)

Choosing Priors

Good priors come from:
  • Literature values: Published measurements for similar materials
  • Physical constraints: Known bounds from theory (e.g., diffusivity must be positive)
  • Previous experiments: Results from related cells or conditions
  • Order-of-magnitude estimates: Even rough estimates help stabilize fitting
The prior standard deviation should reflect genuine uncertainty. A narrow prior (small σ\sigma) strongly constrains the parameter; a wide prior (large σ\sigma) allows the data to dominate.
When uncertain about prior strength, start with wide priors (large σ\sigma) and tighten them only if fitting becomes unstable. Overly tight priors can prevent the optimizer from finding good solutions.
The regularization options and usage examples here are not exhaustive. See the API reference for full details on priors, constraints, and penalties.