scico.optimize¶
Optimization algorithms.
Modules
ADMM solver and auxiliary classes. |
|
PGM solvers and auxiliary classes. |
Classes
|
Basic Alternating Direction Method of Multipliers (ADMM) algorithm. |
|
Accelerated proximal gradient method (APGM) algorithm. |
|
Linearized alternating direction method of multipliers algorithm. |
|
Non-linear proximal alternating direction method of multipliers. |
|
Base class for optimizer classes. |
|
Primal–dual hybrid gradient (PDHG) algorithm. |
|
Proximal gradient method (PGM) algorithm. |
|
Proximal alternating direction method of multipliers. |
|
Base class for proximal ADMM solvers. |
- class scico.optimize.ADMM(f, g_list, C_list, rho_list, alpha=1.0, x0=None, subproblem_solver=None, **kwargs)¶
Bases:
OptimizerBasic Alternating Direction Method of Multipliers (ADMM) algorithm.
Solve an optimization problem of the form
\[\argmin_{\mb{x}} \; f(\mb{x}) + \sum_{i=1}^N g_i(C_i \mb{x}) \;,\]where \(f\) and the \(g_i\) are instances of
Functional, and the \(C_i\) areLinearOperator.The optimization problem is solved by introducing the splitting \(\mb{z}_i = C_i \mb{x}\) and solving
\[\argmin_{\mb{x}, \mb{z}_i} \; f(\mb{x}) + \sum_{i=1}^N g_i(\mb{z}_i) \; \text{such that}\; C_i \mb{x} = \mb{z}_i \;,\]via an ADMM algorithm [26] [24] [12] consisting of the iterations (see
step)\[\begin{split}\begin{aligned} \mb{x}^{(k+1)} &= \argmin_{\mb{x}} \; f(\mb{x}) + \sum_i \frac{\rho_i}{2} \norm{\mb{z}^{(k)}_i - \mb{u}^{(k)}_i - C_i \mb{x}}_2^2 \\ \mb{z}_i^{(k+1)} &= \argmin_{\mb{z}_i} \; g_i(\mb{z}_i) + \frac{\rho_i}{2} \norm{\mb{z}_i - \mb{u}^{(k)}_i - C_i \mb{x}^{(k+1)}}_2^2 \\ \mb{u}_i^{(k+1)} &= \mb{u}_i^{(k)} + C_i \mb{x}^{(k+1)} - \mb{z}^{(k+1)}_i \; . \end{aligned}\end{split}\]- Attributes:
f (
Functional) – Functional \(f\) (usually aLoss)g_list (list of
Functional) – List of \(g_i\) functionals. Must be same length asC_listandrho_list.C_list (list of
LinearOperator) – List of \(C_i\) operators.rho_list (list of scalars) – List of \(\rho_i\) penalty parameters. Must be same length as
C_listandg_list.alpha (float) – Relaxation parameter.
u_list (list of array-like) – List of scaled Lagrange multipliers \(\mb{u}_i\) at current iteration.
x (array-like) – Solution.
subproblem_solver (
SubproblemSolver) – Solver for \(\mb{x}\)-update step.z_list (list of array-like) – List of auxiliary variables \(\mb{z}_i\) at current iteration.
z_list_old (list of array-like) – List of auxiliary variables \(\mb{z}_i\) at previous iteration.
Initialize an
ADMMobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g_list (
List[Functional]) – List of \(g_i\) functionals. Must be same length asC_listandrho_list.C_list (
List[LinearOperator]) – List of \(C_i\) operators.rho_list (
List[float]) – List of \(\rho_i\) penalty parameters. Must be same length asC_listandg_list.alpha (
float) – Relaxation parameter. No relaxation for default 1.0.x0 (
Union[Array,BlockArray,None]) – Initial value for \(\mb{x}\). IfNone, defaults to an array of zeros.subproblem_solver (
Optional[SubproblemSolver]) – Solver for \(\mb{x}\)-update step. Defaults toNone, which implies use of an instance ofGenericSubproblemSolver.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- norm_dual_residual()[source]¶
Compute the \(\ell_2\) norm of the dual residual.
Compute the \(\ell_2\) norm of the dual residual
\[\left\| \sum_{i=1}^N \rho_i C_i^T \left( \mb{z}^{(k)}_i - \mb{z}^{(k-1)}_i \right) \right\|_2 \;.\]- Return type:
- Returns:
Norm of dual residual.
- norm_primal_residual(x=None)[source]¶
Compute the \(\ell_2\) norm of the primal residual.
Compute the \(\ell_2\) norm of the primal residual
\[\left( \sum_{i=1}^N \rho_i \left\| C_i \mb{x} - \mb{z}_i^{(k)} \right\|_2^2\right)^{1/2} \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.x.- Return type:
- Returns:
Norm of primal residual.
- objective(x=None, z_list=None)[source]¶
Evaluate the objective function.
Evaluate the objective function
\[f(\mb{x}) + \sum_{i=1}^N g_i(\mb{z}_i) \;.\]Note that this form is cheaper to compute, but may have very poor accuracy compared with the “true” objective function
\[f(\mb{x}) + \sum_{i=1}^N g_i(C_i \mb{x}) \;.\]when the primal residual is large.
- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.x.z_list (
Optional[List[Union[Array,BlockArray]]]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.z_list.
- Return type:
- Returns:
Value of the objective function.
- step()[source]¶
Perform a single ADMM iteration.
The primary variable \(\mb{x}\) is updated by solving the the optimization problem
\[\mb{x}^{(k+1)} = \argmin_{\mb{x}} \; f(\mb{x}) + \sum_i \frac{\rho_i}{2} \norm{\mb{z}^{(k)}_i - \mb{u}^{(k)}_i - C_i \mb{x}}_2^2 \;.\]Update auxiliary variables \(\mb{z}_i\) and scaled Lagrange multipliers \(\mb{u}_i\). The auxiliary variables are updated according to
\[\begin{split}\begin{aligned} \mb{z}_i^{(k+1)} &= \argmin_{\mb{z}_i} \; g_i(\mb{z}_i) + \frac{\rho_i}{2} \norm{\mb{z}_i - \mb{u}^{(k)}_i - C_i \mb{x}^{(k+1)}}_2^2 \\ &= \mathrm{prox}_{g_i}(C_i \mb{x} + \mb{u}_i, 1 / \rho_i) \;, \end{aligned}\end{split}\]and the scaled Lagrange multipliers are updated according to
\[\mb{u}_i^{(k+1)} = \mb{u}_i^{(k)} + C_i \mb{x}^{(k+1)} - \mb{z}^{(k+1)}_i \;.\]
- u_init(x0)[source]¶
Initialize scaled Lagrange multipliers \(\mb{u}_i\).
Initialized to
\[\mb{u}_i = \mb{0} \;.\]Note that the parameter x0 is unused, but is provided for potential use in an overridden method.
- Parameters:
x0 (
Union[Array,BlockArray]) – Initial value of \(\mb{x}\).- Return type:
List[Union[Array,BlockArray]]
- z_init(x0)[source]¶
Initialize auxiliary variables \(\mb{z}_i\).
Initialized to
\[\mb{z}_i = C_i \mb{x}^{(0)} \;.\]z_listandz_list_oldare initialized to the same value.- Parameters:
x0 (
Union[Array,BlockArray]) – Initial value of \(\mb{x}\).- Return type:
Tuple[List[Union[Array,BlockArray]],List[Union[Array,BlockArray]]]
- class scico.optimize.Optimizer(**kwargs)¶
Bases:
objectBase class for optimizer classes.
- Attributes:
itnum (int) – Optimizer iteration counter.
maxiter (int) – Maximum number of optimizer outer-loop iterations.
timer (
Timer) – Iteration timer.
Initialize common attributes of
Optimizerobjects.- Parameters:
**kwargs (
Any) –Optional parameter dict. Valid keys are:
- iter0:
Initial value of iteration counter. Default value is 0.
- maxiter:
Maximum iterations on call to
solve. Default value is 100.- nanstop:
If
True, stop iterations if aNaNorInfvalue is encountered in a solver working variable. Default value isFalse.- itstat_options:
A dict of named parameters to be passed to the
diagnostics.IterationStatsinitializer. The dict may also include an additional key “itstat_func” with the corresponding value being a function with two parameters, an integer and anOptimizerobject, responsible for constructing a tuple ready for insertion into thediagnostics.IterationStatsobject. IfNone, default values are used for the dict entries, otherwise the default dict is updated with the dict specified by this parameter.
- history(transpose=False)[source]¶
Retrieve record of algorithm iterations.
Retrieve record of algorithm iterations.
- Parameters:
transpose (
bool) – Flag indicating whether results should be returned in “transposed” form, i.e. as a namedtuple of lists rather than a list of namedtuples.- Return type:
Union[List[NamedTuple],Tuple[List]]- Returns:
Record of all iterations.
- load_state(path)[source]¶
Load optimizer state from a file.
Restore optimizer state from a file.
- Parameters:
path (
str) – Filename of state file saved usingsave_state.
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- save_state(path)[source]¶
Save optimizer state to a file.
Save optimizer state to a file.
- Parameters:
path (
str) – Filename of file to which state should be saved.
- class scico.optimize.LinearizedADMM(f, g, C, mu, nu, x0=None, **kwargs)¶
Bases:
OptimizerLinearized alternating direction method of multipliers algorithm.
Solve an optimization problem of the form
\[\argmin_{\mb{x}} \; f(\mb{x}) + g(C \mb{x}) \;,\]where \(f\) and \(g\) are instances of
Functional, (in most cases \(f\) will, more specifically be an an instance ofLoss), and \(C\) is an instance ofLinearOperator.The optimization problem is solved by introducing the splitting \(\mb{z} = C \mb{x}\) and solving
\[\argmin_{\mb{x}, \mb{z}} \; f(\mb{x}) + g(\mb{z}) \; \text{such that}\; C \mb{x} = \mb{z} \;,\]via a linearized ADMM algorithm [61] [47] (Sec. 4.4.2) consisting of the iterations (see
step)\[\begin{split}\begin{aligned} \mb{x}^{(k+1)} &= \mathrm{prox}_{\mu f} \left( \mb{x}^{(k)} - (\mu / \nu) C^T \left(C \mb{x}^{(k)} - \mb{z}^{(k)} + \mb{u}^{(k)} \right) \right) \\ \mb{z}^{(k+1)} &= \mathrm{prox}_{\nu g} \left(C \mb{x}^{(k+1)} + \mb{u}^{(k)} \right) \\ \mb{u}^{(k+1)} &= \mb{u}^{(k)} + C \mb{x}^{(k+1)} - \mb{z}^{(k+1)} \;. \end{aligned}\end{split}\]Parameters \(\mu\) and \(\nu\) are required to satisfy
\[0 < \mu < \nu \| C \|_2^{-2} \;.\]- Attributes:
f (
Functional) – Functional \(f\) (usually aLoss).g (
Functional) – Functional \(g\).C (
LinearOperator) – \(C\) operator.mu (scalar) – First algorithm parameter.
nu (scalar) – Second algorithm parameter.
u (array-like) – Scaled Lagrange multipliers \(\mb{u}\) at current iteration.
x (array-like) – Solution variable.
z (array-like) – Auxiliary variables \(\mb{z}\) at current iteration.
z_old (array-like) – Auxiliary variables \(\mb{z}\) at previous iteration.
Initialize a
LinearizedADMMobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g (
Functional) – Functional \(g\).C (
LinearOperator) – Operator \(C\).mu (
float) – First algorithm parameter.nu (
float) – Second algorithm parameter.x0 (
Union[Array,BlockArray,None]) – Starting point for \(\mb{x}\). IfNone, defaults to an array of zeros.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- norm_dual_residual()[source]¶
Compute the \(\ell_2\) norm of the dual residual.
Compute the \(\ell_2\) norm of the dual residual
\[\norm{\mb{z}^{(k)} - \mb{z}^{(k-1)}}_2 \;.\]- Return type:
- Returns:
Current norm of dual residual.
- norm_primal_residual(x=None)[source]¶
Compute the \(\ell_2\) norm of the primal residual.
Compute the \(\ell_2\) norm of the primal residual
\[\norm{C \mb{x} - \mb{z}}_2 \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.x.- Return type:
- Returns:
Norm of primal residual.
- objective(x=None, z=None)[source]¶
Evaluate the objective function.
Evaluate the objective function
\[f(\mb{x}) + g(\mb{z}) \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.x.z (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.z.
- Return type:
- Returns:
scalar – Value of the objective function.
- step()[source]¶
Perform a single linearized ADMM iteration.
The primary variable \(\mb{x}\) is updated by computing
\[\mb{x}^{(k+1)} = \mathrm{prox}_{\mu f} \left( \mb{x}^{(k)} - (\mu / \nu) A^T \left(A \mb{x}^{(k)} - \mb{z}^{(k)} + \mb{u}^{(k)} \right) \right) \;.\]The auxiliary variable is updated according to
\[\mb{z}^{(k+1)} = \mathrm{prox}_{\nu g} \left(A \mb{x}^{(k+1)} + \mb{u}^{(k)} \right) \;,\]and the scaled Lagrange multiplier is updated according to
\[\mb{u}^{(k+1)} = \mb{u}^{(k)} + C \mb{x}^{(k+1)} - \mb{z}^{(k+1)} \;.\]
- u_init(x0)[source]¶
Initialize scaled Lagrange multiplier \(\mb{u}\).
Initialized to
\[\mb{u} = \mb{0} \;.\]Note that the parameter x0 is unused, but is provided for potential use in an overridden method.
- Parameters:
x0 (
Union[Array,BlockArray]) – Starting point for \(\mb{x}\).- Return type:
Union[Array,BlockArray]
- z_init(x0)[source]¶
Initialize auxiliary variable \(\mb{z}\).
Initialized to
\[\mb{z} = C \mb{x}^{(0)} \;.\]zandz_oldare initialized to the same value.- Parameters:
x0 (
Union[Array,BlockArray]) – Starting point for \(\mb{x}\).- Return type:
Tuple[Union[Array,BlockArray],Union[Array,BlockArray]]
- class scico.optimize.PGM(f, g, L0, x0, step_size=None, **kwargs)¶
Bases:
OptimizerProximal gradient method (PGM) algorithm.
Minimize a functional of the form \(f(\mb{x}) + g(\mb{x})\), where \(f\) and the \(g\) are instances of
Functional. Functional \(f\) should be differentiable and have a Lipschitz continuous derivative, and functional \(g\) should have a proximal operator defined.The step size \(\alpha\) of the algorithm is defined in terms of its reciprocal \(L\), i.e. \(\alpha = 1 / L\). The initial value for this parameter, L0, is required to satisfy
\[L_0 \geq K(\nabla f) \;,\]where \(K(\nabla f)\) denotes the Lipschitz constant of the gradient of \(f\). When f is an instance of
SquaredL2Losswith aLinearOperatorA,\[K(\nabla f) = \lambda_{ \mathrm{max} }( A^H A ) = \| A \|_2^2 \;,\]where \(\lambda_{\mathrm{max}}(B)\) denotes the largest eigenvalue of \(B\).
The evolution of the step size is controlled by auxiliary class
PGMStepSizeand derived classes. The defaultPGMStepSizesimply sets \(L = L_0\), while the derived classes implement a variety of adaptive strategies.- Parameters:
f (
Union[Loss,Functional]) – Instance ofLossorFunctionalwith defined grad method.g (
Functional) – Instance ofFunctionalwith defined prox method.L0 (
float) – Initial estimate of Lipschitz constant of gradient of f.x0 (
Union[Array,BlockArray]) – Starting point for \(\mb{x}\).step_size (
Optional[PGMStepSize]) – Instance of an auxiliary class of typePGMStepSizedetermining the evolution of the algorithm step size.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- f_quad_approx(x, y, L)[source]¶
Evaluate the quadratic approximation to function \(f\).
Evaluate the quadratic approximation to function \(f\), corresponding to \(\hat{f}_{L}(\mb{x}, \mb{y}) = f(\mb{y}) + \nabla f(\mb{y})^H (\mb{x} - \mb{y}) + \frac{L}{2} \left\|\mb{x} - \mb{y}\right\|_2^2\).
- Return type:
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- norm_residual()[source]¶
Return the fixed point residual.
Return the fixed point residual (see Sec. 4.3 of [38]).
- Return type:
- x_step(v, L)[source]¶
Compute update for variable x.
- Return type:
Union[Array,BlockArray]
- class scico.optimize.AcceleratedPGM(f, g, L0, x0, step_size=None, **kwargs)¶
Bases:
PGMAccelerated proximal gradient method (APGM) algorithm.
Minimize a function of the form \(f(\mb{x}) + g(\mb{x})\), where \(f\) and the \(g\) are instances of
Functional. The accelerated form of PGM is also known as FISTA [8].See
PGMfor more detailed documentation.- Parameters:
f (
Union[Loss,Functional]) – Instance ofLossorFunctionalwith defined grad method.g (
Functional) – Instance ofFunctionalwith defined prox method.L0 (
float) – Initial estimate of Lipschitz constant of gradient of f.x0 (
Union[Array,BlockArray]) – Starting point for \(\mb{x}\).step_size (
Optional[PGMStepSize]) – Instance of an auxiliary class of typePGMStepSizedetermining the evolution of the algorithm step size.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- class scico.optimize.PDHG(f, g, C, tau, sigma, alpha=1.0, x0=None, z0=None, **kwargs)¶
Bases:
OptimizerPrimal–dual hybrid gradient (PDHG) algorithm.
Primal–dual hybrid gradient (PDHG) is a family of algorithms [22] that includes the Chambolle-Pock primal-dual algorithm [14]. The form implemented here is a minor variant [48] of the original Chambolle-Pock algorithm.
Solve an optimization problem of the form
\[\argmin_{\mb{x}} \; f(\mb{x}) + g(C \mb{x}) \;,\]where \(f\) and \(g\) are instances of
Functional, (in most cases \(f\) will, more specifically be an an instance ofLoss), and \(C\) is an instance ofOperatororLinearOperator.When C is a
LinearOperator, the algorithm iterations are\[\begin{split}\begin{aligned} \mb{x}^{(k+1)} &= \mathrm{prox}_{\tau f} \left( \mb{x}^{(k)} - \tau C^T \mb{z}^{(k)} \right) \\ \mb{z}^{(k+1)} &= \mathrm{prox}_{\sigma g^*} \left( \mb{z}^{(k)} + \sigma C((1 + \alpha) \mb{x}^{(k+1)} - \alpha \mb{x}^{(k)} \right) \;, \end{aligned}\end{split}\]where \(g^*\) denotes the convex conjugate of \(g\). Parameters \(\tau > 0\) and \(\sigma > 0\) are also required to satisfy
\[\tau \sigma < \| C \|_2^{-2} \;,\]and it is required that \(\alpha \in [0, 1]\).
When C is a non-linear
Operator, a non-linear PDHG variant [55] is used, with the same iterations except for \(\mb{x}\) update\[\mb{x}^{(k+1)} = \mathrm{prox}_{\tau f} \left( \mb{x}^{(k)} - \tau [J_x C(\mb{x}^{(k)})]^T \mb{z}^{(k)} \right) \;.\]- Attributes:
f (
Functional) – Functional \(f\) (usually aLoss).g (
Functional) – Functional \(g\).C (
Operator) – \(C\) operator.tau (scalar) – First algorithm parameter.
sigma (scalar) – Second algorithm parameter.
alpha (scalar) – Relaxation parameter.
x (array-like) – Primal variable \(\mb{x}\) at current iteration.
x_old (array-like) – Primal variable \(\mb{x}\) at previous iteration.
z (array-like) – Dual variable \(\mb{z}\) at current iteration.
z_old (array-like) – Dual variable \(\mb{z}\) at previous iteration.
Initialize a
PDHGobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g (
Functional) – Functional \(g\).C (
Operator) – Operator \(C\).tau (
float) – First algorithm parameter.sigma (
float) – Second algorithm parameter.alpha (
float) – Relaxation parameter.x0 (
Union[Array,BlockArray,None]) – Starting point for \(\mb{x}\). IfNone, defaults to an array of zeros.z0 (
Union[Array,BlockArray,None]) – Starting point for \(\mb{z}\). IfNone, defaults to an array of zeros.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- static estimate_parameters(C, x=None, ratio=1.0, factor=1.01, maxiter=100, key=None)[source]¶
Estimate tau and sigma parameters of
PDHG.Find values of the tau and sigma parameters of
PDHGthat respect the constraint\[\tau \sigma < \| C \|_2^{-2} \quad \text{or} \quad \tau \sigma < \| J_x C(\mb{x}) \|_2^{-2} \;,\]depending on whether \(C\) is a
LinearOperatoror not.- Parameters:
C (
Operator) – Operator \(C\).x (
Union[Array,BlockArray,None]) – Value of \(\mb{x}\) at which to evaluate the Jacobian of \(C\) (when it is not aLinearOperator). IfNone, defaults to an array of zeros.ratio (
float) – Desired ratio between return \(\tau\) and \(\sigma\) values (\(\sigma = \mathrm{ratio} \tau\)).factor (
Optional[float]) – Safety factor with which to multiply \(\| C \|_2^{-2}\) to ensure strict inequality compliance. IfNone, the value is set to 1.0.maxiter (
int) – Maximum number of power iterations to use in operator norm estimation (seeoperator_norm). Default: 100.key (
Optional[Array]) – Jax PRNG key to use in operator norm estimation (seeoperator_norm). Defaults toNone, in which case a new key is created.
- Returns:
A tuple (tau, sigma) representing the estimated parameter values.
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- norm_dual_residual()[source]¶
Compute the \(\ell_2\) norm of the dual residual.
Compute the \(\ell_2\) norm of the dual residual
\[\sigma^{-1} \norm{\mb{z}^{(k)} - \mb{z}^{(k-1)}}_2 \;.\]- Return type:
- Returns:
Current norm of dual residual.
- norm_primal_residual()[source]¶
Compute the \(\ell_2\) norm of the primal residual.
Compute the \(\ell_2\) norm of the primal residual
\[\tau^{-1} \norm{\mb{x}^{(k)} - \mb{x}^{(k-1)}}_2 \;.\]- Return type:
- Returns:
Current norm of primal residual.
- objective(x=None)[source]¶
Evaluate the objective function.
Evaluate the objective function
\[f(\mb{x}) + g(C \mb{x}) \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.x- Return type:
- Returns:
scalar – Value of the objective function.
- class scico.optimize.ProximalADMM(f, g, A, rho, mu, nu, B=None, c=None, x0=None, z0=None, u0=None, fast_dual_residual=True, **kwargs)¶
Bases:
ProximalADMMBaseProximal alternating direction method of multipliers.
Solve an optimization problem of the form
\[\argmin_{\mb{x}} \; f(\mb{x}) + g(\mb{z}) \; \text{such that}\; A \mb{x} + B \mb{z} = \mb{c} \;,\]where \(f\) and \(g\) are instances of
Functional, (in most cases \(f\) will, more specifically be an instance ofLoss), and \(A\) and \(B\) are instances ofLinearOperator.The optimization problem is solved via a variant of the proximal ADMM algorithm [19], consisting of the iterations (see
step)\[\begin{split}\begin{aligned} \mb{x}^{(k+1)} &= \mathrm{prox}_{\rho^{-1} \mu^{-1} f} \left( \mb{x}^{(k)} - \mu^{-1} A^T \left(2 \mb{u}^{(k)} - \mb{u}^{(k-1)} \right) \right) \\ \mb{z}^{(k+1)} &= \mathrm{prox}_{\rho^{-1} \nu^{-1} g} \left( \mb{z}^{(k)} - \nu^{-1} B^T \left( A \mb{x}^{(k+1)} + B \mb{z}^{(k)} - \mb{c} + \mb{u}^{(k)} \right) \right) \\ \mb{u}^{(k+1)} &= \mb{u}^{(k)} + A \mb{x}^{(k+1)} + B \mb{z}^{(k+1)} - \mb{c} \;. \end{aligned}\end{split}\]Parameters \(\mu\) and \(\nu\) are required to satisfy
\[\mu > \norm{ A }_2^2 \quad \text{and} \quad \nu > \norm{ B }_2^2 \;.\]- Attributes:
A (
LinearOperator) – \(A\) linear operator.B (
LinearOperator) – \(B\) linear operator.c (array-like) – constant \(\mb{c}\).
Initialize a
ProximalADMMobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g (
Functional) – Functional \(g\).A (
LinearOperator) – Linear operator \(A\).rho (
float) – Penalty parameter.mu (
float) – First algorithm parameter.nu (
float) – Second algorithm parameter.B (
Optional[LinearOperator]) – Linear operator \(B\) (ifNone, \(B = -I\) where \(I\) is the identity operator).c (
Union[float,Array,BlockArray,None]) – Constant \(\mb{c}\). IfNone, defaults to zero.x0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{x}\). IfNone, defaults to an array of zeros.z0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{z}\). IfNone, defaults to an array of zeros.u0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{u}\). IfNone, defaults to an array of zeros.fast_dual_residual (
bool) – Flag indicating whether to use fast approximation to the dual residual, or a slower but more accurate calculation.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- static estimate_parameters(A, B=None, factor=1.01, maxiter=100, key=None)[source]¶
Estimate mu and nu parameters of
ProximalADMM.Find values of the mu and nu parameters of
ProximalADMMthat respect the constraints\[\mu > \norm{ A }_2^2 \quad \text{and} \quad \nu > \norm{ B }_2^2 \;.\]- Parameters:
A (
LinearOperator) – Linear operator \(A\).B (
Optional[LinearOperator]) – Linear operator \(B\) (ifNone, \(B = -I\) where \(I\) is the identity operator).factor (
Optional[float]) – Safety factor with which to multiply estimated operator norms to ensure strict inequality compliance. IfNone, return the estimated squared operator norms.maxiter (
int) – Maximum number of power iterations to use in operator norm estimation (seeoperator_norm). Default: 100.key (
Optional[Array]) – Jax PRNG key to use in operator norm estimation (seeoperator_norm). Defaults toNone, in which case a new key is created.
- Return type:
- Returns:
A tuple (mu, nu) representing the estimated parameter values or corresponding squared operator norm values, depending on the value of the factor parameter.
- norm_dual_residual()[source]¶
Compute the \(\ell_2\) norm of the dual residual.
Compute the \(\ell_2\) norm of the dual residual. If the flag requesting a fast approximate calculation is set, it is computed as
\[\norm{\mb{z}^{(k+1)} - \mb{z}^{(k)}}_2 \;,\]otherwise it is computed as
\[\norm{A^T B ( \mb{z}^{(k+1)} - \mb{z}^{(k)} ) }_2 \;.\]- Return type:
- Returns:
Current norm of dual residual.
- norm_primal_residual(x=None, z=None)[source]¶
Compute the \(\ell_2\) norm of the primal residual.
Compute the \(\ell_2\) norm of the primal residual
\[\norm{A \mb{x} + B \mb{z} - \mb{c}}_2 \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.x.z (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.z.
- Return type:
- Returns:
Norm of primal residual.
- class scico.optimize.NonLinearPADMM(f, g, H, rho, mu, nu, x0=None, z0=None, u0=None, fast_dual_residual=True, **kwargs)¶
Bases:
ProximalADMMBaseNon-linear proximal alternating direction method of multipliers.
Solve an optimization problem of the form
\[\argmin_{\mb{x}} \; f(\mb{x}) + g(\mb{z}) \; \text{such that}\; H(\mb{x}, \mb{z}) = 0 \;,\]where \(f\) and \(g\) are instances of
Functional, (in most cases \(f\) will, more specifically be an instance ofLoss), and \(H\) is a function.The optimization problem is solved via a variant of the proximal ADMM algorithm for problems with a non-linear operator constraint [11], consisting of the iterations (see
step)\[\begin{split}\begin{aligned} A^{(k)} &= J_{\mb{x}} H(\mb{x}^{(k)}, \mb{z}^{(k)}) \\ \mb{x}^{(k+1)} &= \mathrm{prox}_{\rho^{-1} \mu^{-1} f} \left( \mb{x}^{(k)} - \mu^{-1} (A^{(k)})^T \left(2 \mb{u}^{(k)} - \mb{u}^{(k-1)} \right) \right) \\ B^{(k)} &= J_{\mb{z}} H(\mb{x}^{(k+1)}, \mb{z}^{(k)}) \\ \mb{z}^{(k+1)} &= \mathrm{prox}_{\rho^{-1} \nu^{-1} g} \left( \mb{z}^{(k)} - \nu^{-1} (B^{(k)})^T \left( H(\mb{x}^{(k+1)}, \mb{z}^{(k)}) + \mb{u}^{(k)} \right) \right) \\ \mb{u}^{(k+1)} &= \mb{u}^{(k)} + H(\mb{x}^{(k+1)}, \mb{z}^{(k+1)}) \;. \end{aligned}\end{split}\]Parameters \(\mu\) and \(\nu\) are required to satisfy
\[\mu > \norm{ A^{(k)} }_2^2 \quad \text{and} \quad \nu > \norm{ B^{(k)} }_2^2\]for all \(A^{(k)}\) and \(B^{(k)}\).
- Attributes:
H (
Function) – \(H\) function.
Initialize a
NonLinearPADMMobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g (
Functional) – Functional \(g\).H (
Function) – Function \(H\).rho (
float) – Penalty parameter.mu (
float) – First algorithm parameter.nu (
float) – Second algorithm parameter.x0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{x}\). IfNone, defaults to an array of zeros.z0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{z}\). IfNone, defaults to an array of zeros.u0 (
Union[Array,BlockArray,None]) – Starting value for \(\mb{u}\). IfNone, defaults to an array of zeros.fast_dual_residual (
bool) – Flag indicating whether to use fast approximation to the dual residual, or a slower but more accurate calculation.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- static estimate_parameters(H, x=None, z=None, factor=1.01, maxiter=100, key=None)[source]¶
Estimate mu and nu parameters of
NonLinearPADMM.Find values of the mu and nu parameters of
NonLinearPADMMthat respect the constraints\[\mu > \norm{ J_x H(\mb{x}, \mb{z}) }_2^2 \quad \text{and} \quad \nu > \norm{ J_z H(\mb{x}, \mb{z}) }_2^2 \;.\]- Parameters:
H (
Function) – Constraint function \(H\).x (
Union[Array,BlockArray,None]) – Value of \(\mb{x}\) at which to evaluate the Jacobian. IfNone, defaults to an array of zeros.z (
Union[Array,BlockArray,None]) – Value of \(\mb{z}\) at which to evaluate the Jacobian. IfNone, defaults to an array of zeros.factor (
Optional[float]) – Safety factor with which to multiply estimated operator norms to ensure strict inequality compliance. IfNone, return the estimated squared operator norms.maxiter (
int) – Maximum number of power iterations to use in operator norm estimation (seeoperator_norm). Default: 100.key (
Optional[Array]) – Jax PRNG key to use in operator norm estimation (seeoperator_norm). Defaults toNone, in which case a new key is created.
- Return type:
- Returns:
A tuple (mu, nu) representing the estimated parameter values or corresponding squared operator norm values, depending on the value of the factor parameter.
- norm_dual_residual()[source]¶
Compute the \(\ell_2\) norm of the dual residual.
Compute the \(\ell_2\) norm of the dual residual. If the flag requesting a fast approximate calculation is set, it is computed as
\[\norm{\mb{z}^{(k+1)} - \mb{z}^{(k)}}_2 \;,\]otherwise it is computed as
\[\norm{A^T B ( \mb{z}^{(k+1)} - \mb{z}^{(k)} ) }_2 \;,\]where
\[\begin{split}A &= J_{\mb{x}} H(\mb{x}^{(k+1)}, \mb{z}^{(k+1)}) \\ B &= J_{\mb{z}} H(\mb{x}^{(k+1)}, \mb{z}^{(k+1)}) \;.\end{split}\]- Return type:
- Returns:
Current norm of dual residual.
- norm_primal_residual(x=None, z=None)[source]¶
Compute the \(\ell_2\) norm of the primal residual.
Compute the \(\ell_2\) norm of the primal residual
\[\norm{H(\mb{x}, \mb{z})}_2 \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.x.z (
Union[Array,BlockArray,None]) – Point at which to evaluate primal residual. IfNone, the primal residual is evaluated at the current iterateself.z.
- Return type:
- Returns:
Norm of primal residual.
- class scico.optimize.ProximalADMMBase(f, g, rho, mu, nu, xshape, zshape, ushape, xdtype, zdtype, udtype, x0=None, z0=None, u0=None, fast_dual_residual=True, **kwargs)¶
Bases:
OptimizerBase class for proximal ADMM solvers.
- Attributes:
f (
Functional) – Functional \(f\) (usually aLoss).g (
Functional) – Functional \(g\).rho (scalar) – Penalty parameter.
mu (scalar) – First algorithm parameter.
nu (scalar) – Second algorithm parameter.
x (array-like) – Solution variable.
z (array-like) – Auxiliary variables \(\mb{z}\) at current iteration.
z_old (array-like) – Auxiliary variables \(\mb{z}\) at previous iteration.
u (array-like) – Scaled Lagrange multipliers \(\mb{u}\) at current iteration.
u_old (array-like) – Scaled Lagrange multipliers \(\mb{u}\) at previous iteration.
Initialize a
ProximalADMMBaseobject.- Parameters:
f (
Functional) – Functional \(f\) (usually a loss function).g (
Functional) – Functional \(g\).rho (
float) – Penalty parameter.mu (
float) – First algorithm parameter.nu (
float) – Second algorithm parameter.xshape (
Union[Tuple[int,...],Tuple[Tuple[int,...],...]]) – Shape of variable \(\mb{x}\).zshape (
Union[Tuple[int,...],Tuple[Tuple[int,...],...]]) – Shape of variable \(\mb{z}\).ushape (
Union[Tuple[int,...],Tuple[Tuple[int,...],...]]) – Shape of variable \(\mb{u}\).xdtype (
DType) – Dtype of variable \(\mb{x}\).zdtype (
DType) – Dtype of variable \(\mb{z}\).udtype (
DType) – Dtype of variable \(\mb{u}\).x0 (
Union[Array,BlockArray,None]) – Initial value for \(\mb{x}\). IfNone, defaults to an array of zeros.z0 (
Union[Array,BlockArray,None]) – Initial value for \(\mb{z}\). IfNone, defaults to an array of zeros.u0 (
Union[Array,BlockArray,None]) – Initial value for \(\mb{u}\). IfNone, defaults to an array of zeros.fast_dual_residual (
bool) – Flag indicating whether to use fast approximation to the dual residual, or a slower but more accurate calculation.**kwargs – Additional optional parameters handled by initializer of base class
Optimizer.
- minimizer()[source]¶
Return the current estimate of the functional mimimizer.
- Return type:
Union[Array,BlockArray]- Returns:
Current estimate of the functional mimimizer.
- objective(x=None, z=None)[source]¶
Evaluate the objective function.
Evaluate the objective function
\[f(\mb{x}) + g(\mb{z}) \;.\]- Parameters:
x (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.x.z (
Union[Array,BlockArray,None]) – Point at which to evaluate objective function. IfNone, the objective is evaluated at the current iterateself.z.
- Return type:
- Returns:
scalar – Current value of the objective function.