scico.solver#
Solver and optimization algorithms.
This module provides a number of functions for solving linear systems and
optimization problems, some of which are used as subproblem solvers
within the iterations of the proximal algorithms in the
scico.optimize
subpackage.
This module also provides scico interface wrappers for functions
from scipy.optimize
since jax directly implements only a very
limited subset of these functions (there is limited, experimental support
for L-BFGS-B), but only CG
and BFGS are fully supported. These wrappers are required because the
functions in scipy.optimize
only support on 1D, real valued, numpy
arrays. These limitations are addressed by:
Enabling the use of multi-dimensional arrays by flattening and reshaping within the wrapper.
Enabling the use of jax arrays by automatically converting to and from numpy arrays.
Enabling the use of complex arrays by splitting them into real and imaginary parts.
The wrapper also JIT compiles the function and gradient evaluations.
These wrapper functions have a number of advantages and disadvantages
with respect to those in jax.scipy.optimize
:
This module provides many more algorithms than
jax.scipy.optimize
.The functions in this module tend to be faster for small-scale problems (presumably due to some overhead in the jax functions).
The functions in this module are slower for large problems due to the frequent host-device copies corresponding to conversion between numpy arrays and jax arrays.
The solvers in this module can’t be JIT compiled, and gradients cannot be taken through them.
In the future, these wrapper functions may be replaced with a dependency on JAXopt.
Functions
|
Vectorised root finding via bisection method. |
|
Conjugate Gradient solver. |
|
Vectorised scalar minimization via golden section method. |
|
Least squares solver. |
|
Minimization of scalar function of one or more variables. |
|
Minimization of scalar function of one variable. |
Classes
|
Solver for a linear system involving a sum of convolutions. |
|
Solver for linear system involving a symmetric product. |
- scico.solver.minimize(func, x0, args=(), method='L-BFGS-B', hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)[source]#
Minimization of scalar function of one or more variables.
Wrapper around
scipy.optimize.minimize
. This function differs fromscipy.optimize.minimize
in three ways: :rtype:OptimizeResult
The jac options of
scipy.optimize.minimize
are not supported. The gradient is calculated using jax.grad.Functions mapping from N-dimensional arrays -> float are supported.
Functions mapping from complex arrays -> float are supported.
For more detail, including descriptions of the optimization methods and custom minimizers, refer to the original docs for
scipy.optimize.minimize
.
- scico.solver.minimize_scalar(func, bracket=None, bounds=None, args=(), method='brent', tol=None, options=None)[source]#
Minimization of scalar function of one variable.
Wrapper around
scipy.optimize.minimize_scalar
.For more detail, including descriptions of the optimization methods and custom minimizers, refer to the original docstring for
scipy.optimize.minimize_scalar
.- Return type:
- scico.solver.cg(A, b, x0=None, *, tol=1e-05, atol=0.0, maxiter=1000, info=True, M=None)[source]#
Conjugate Gradient solver.
Solve the linear system \(A\mb{x} = \mb{b}\), where \(A\) is positive definite, via the conjugate gradient method.
- Parameters:
A (
Callable
) – Callable implementing linear operator \(A\), which should be positive definite.b (
Array
) – Input array \(\mb{b}\).x0 (
Optional
[Array
]) – Initial solution. If A is aLinearOperator
, this parameter need not be specified, and defaults to a zero array. Otherwise, it is required.tol (
float
) – Relative residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).atol (
float
) – Absolute residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).maxiter (
int
) – Maximum iterations. Default: 1000.info (
bool
) – IfTrue
return a tuple consting of the solution array and a dictionary containing diagnostic information, otherwise just return the solution.M (
Optional
[Callable
]) – Preconditioner for A. The preconditioner should approximate the inverse of A. The default,None
, uses no preconditioner.
- Return type:
- Returns:
tuple –
A tuple (x, info) containing:
x : Solution array.
info: Dictionary containing diagnostic information.
- scico.solver.lstsq(A, b, x0=None, tol=1e-05, atol=0.0, maxiter=1000, info=False, M=None)[source]#
Least squares solver.
Solve the least squares problem
\[\argmin_{\mb{x}} \; (1/2) \norm{ A \mb{x} - \mb{b}) }_2^2 \;,\]where \(A\) is a linear operator and \(\mb{b}\) is a vector. The problem is solved using
cg
.- Parameters:
A (
Callable
) – Callable implementing linear operator \(A\).b (
Array
) – Input array \(\mb{b}\).x0 (
Optional
[Array
]) – Initial solution. If A is aLinearOperator
, this parameter need not be specified, and defaults to a zero array. Otherwise, it is required.tol (
float
) – Relative residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).atol (
float
) – Absolute residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).maxiter (
int
) – Maximum iterations. Default: 1000.info (
bool
) – IfTrue
return a tuple consting of the solution array and a dictionary containing diagnostic information, otherwise just return the solution.M (
Optional
[Callable
]) – Preconditioner for A. The preconditioner should approximate the inverse of A. The default,None
, uses no preconditioner.
- Return type:
- Returns:
tuple –
A tuple (x, info) containing:
x : Solution array.
info: Dictionary containing diagnostic information.
- scico.solver.bisect(f, a, b, args=(), xtol=1e-07, ftol=1e-07, maxiter=100, full_output=False, range_check=True)[source]#
Vectorised root finding via bisection method.
Vectorised root finding via bisection method, supporting simultaneous finding of multiple roots on a function defined over a multi-dimensional array. When the function is array-valued, each of these values is treated as the independent application of a scalar function. The initial interval [a, b] must bracket the root for all scalar functions.
The interface is similar to that of
scipy.optimize.bisect
, which is much faster when f is a scalar function and a and b are scalars.- Parameters:
f (
Callable
) – Function returning a float or an array of floats.a (
Array
) – Lower bound of interval on which to apply bisection.b (
Array
) – Upper bound of interval on which to apply bisection.args (
Tuple
) – Additional arguments for function f.xtol (
float
) – Stopping tolerance based on maximum bisection interval length over array.ftol (
float
) – Stopping tolerance based on maximum absolute function value over array.maxiter (
int
) – Maximum number of algorithm iterations.full_output (
bool
) – IfFalse
, return just the root, otherwise return a tuple (x, info) where x is the root and info is a dict containing algorithm status information.range_check (
bool
) – IfTrue
, check to ensure that the initial [a, b] range brackets the root of f.
- Return type:
- Returns:
tuple –
A tuple (x, info) containing:
x : Root array.
info: Dictionary containing diagnostic information.
- scico.solver.golden(f, a, b, c=None, args=(), xtol=1e-07, maxiter=100, full_output=False)[source]#
Vectorised scalar minimization via golden section method.
Vectorised scalar minimization via golden section method, supporting simultaneous minimization of a function defined over a multi-dimensional array. When the function is array-valued, each of these values is treated as the independent application of a scalar function. The minimizer must lie within the interval (a, b) for all scalar functions, and, if specified c must be within that interval.
The interface is more similar to that of
bisect
than that ofscipy.optimize.golden
which is much faster when f is a scalar function and a, b, and c are scalars.- Parameters:
f (
Callable
) – Function returning a float or an array of floats.a (
Array
) – Lower bound of interval on which to search.b (
Array
) – Upper bound of interval on which to search.c (
Optional
[Array
]) – Initial value for first search point interior to bounding interval (a, b)args (
Tuple
) – Additional arguments for function f.xtol (
float
) – Stopping tolerance based on maximum search interval length over array.maxiter (
int
) – Maximum number of algorithm iterations.full_output (
bool
) – IfFalse
, return just the minizer, otherwise return a tuple (x, info) where x is the minimizer and info is a dict containing algorithm status information.
- Return type:
- Returns:
tuple –
A tuple (x, info) containing:
x : Minimizer array.
info: Dictionary containing diagnostic information.
- class scico.solver.MatrixATADSolver(A, D, W=None, cho_factor=False, lower=False, check_finite=True)[source]#
Bases:
object
Solver for linear system involving a symmetric product.
Solve a linear system of the form
\[(A^T W A + D) \mb{x} = \mb{b}\]or
\[(A^T W A + D) X = B \;,\]where \(A \in \mbb{R}^{M \times N}\), \(W \in \mbb{R}^{M \times M}\) and \(D \in \mbb{R}^{N \times N}\). \(A\) must be an instance of
MatrixOperator
or an array; \(D\) must be an instance ofMatrixOperator
,Diagonal
, or an array, and \(W\), if specified, must be an instance ofDiagonal
or an array.The solution is computed by factorization of matrix \(A^T W A + D\) and solution via Gaussian elimination. If \(D\) is diagonal and \(N < M\) (i.e. \(A W A^T\) is smaller than \(A^T W A\)), then \(A W A^T + D\) is factorized and the original problem is solved via the Woodbury matrix identity
\[(E + U C V)^{-1} = E^{-1} - E^{-1} U (C^{-1} + V E^{-1} U)^{-1} V E^{-1} \;.\]Setting
\[\begin{split}E &= D \\ U &= A^T \\ C &= W \\ V &= A\end{split}\]we have
\[(D + A^T W A)^{-1} = D^{-1} - D^{-1} A^T (W^{-1} + A D^{-1} A^T)^{-1} A D^{-1}\]which can be simplified to
\[(D + A^T W A)^{-1} = D^{-1} (I - A^T G^{-1} A D^{-1})\]by defining \(G = W^{-1} + A D^{-1} A^T\). We therefore have that
\[\mb{x} = (D + A^T W A)^{-1} \mb{b} = D^{-1} (I - A^T G^{-1} A D^{-1}) \mb{b} \;.\]If we have a Cholesky factorization of \(G\), e.g. \(G = L L^T\), we can define
\[\mb{w} = G^{-1} A D^{-1} \mb{b}\]so that
\[\begin{split}G \mb{w} &= A D^{-1} \mb{b} \\ L L^T \mb{w} &= A D^{-1} \mb{b} \;.\end{split}\]The Cholesky factorization can be exploited by solving for \(\mb{z}\) in
\[L \mb{z} = A D^{-1} \mb{b}\]and then for \(\mb{w}\) in
\[L^T \mb{w} = \mb{z} \;,\]so that
\[\mb{x} = D^{-1} \mb{b} - D^{-1} A^T \mb{w} \;.\](Functions
cho_solve
andlu_solve
allow direct solution for \(\mb{w}\) without the two-step procedure described here.) A Cholesky factorization should only be used when \(G\) is positive-definite (e.g. \(D\) is diagonal and positive); if not, an LU factorization should be used.Complex-valued problems are also supported, in which case the transpose \(\cdot^T\) in the equations above should be taken to represent the conjugate transpose.
To solve problems directly involving a matrix of the form \(A W A^T + D\), initialize with
A.T
(orA.T.conj()
for complex problems) instead ofA
.- Parameters:
A (
Union
[MatrixOperator
,Array
]) – Matrix \(A\).D (
Union
[MatrixOperator
,Diagonal
,Array
]) – Matrix \(D\). If a 2D array orMatrixOperator
, specifies the 2D matrix \(D\). If 1D array orDiagonal
, specifies the diagonal elements of \(D\).W (
Union
[Diagonal
,Array
,None
]) – Matrix \(W\). Specifies the diagonal elements of \(W\). Defaults to an array with unit entries.cho_factor (
bool
) – Flag indicating whether to use Cholesky (True
) or LU (False
) factorization.lower (
bool
) – Flag indicating whether lower (True
) or upper (False
) triangular factorization should be computed. Only relevant to Cholesky factorization.check_finite (
bool
) – Flag indicating whether the input array should be checked forInf
andNaN
values.
- class scico.solver.ConvATADSolver(A, D)[source]#
Bases:
object
Solver for a linear system involving a sum of convolutions.
Solve a linear system of the form
\[(A^H A + D) \mb{x} = \mb{b}\]where \(A\) is a block-row operator with circulant blocks, i.e. it can be written as
\[A = \left( \begin{array}{cccc} A_1 & A_2 & \ldots & A_{K} \end{array} \right) \;,\]where all of the \(A_k\) are circular convolution operators, and \(D\) is a circular convolution operator. This problem is most easily solved in the DFT transform domain, where the circular convolutions become diagonal operators. Denoting the frequency-domain versions of variables with a circumflex (e.g. \(\hat{\mb{x}}\) is the frequency-domain version of \(\mb{x}\)), the the problem can be written as
\[(\hat{A}^H \hat{A} + \hat{D}) \hat{\mb{x}} = \hat{\mb{b}} \;,\]where
\[\hat{A} = \left( \begin{array}{cccc} \hat{A}_1 & \hat{A}_2 & \ldots & \hat{A}_{K} \end{array} \right) \;,\]and \(\hat{D}\) and all the \(\hat{A}_k\) are diagonal operators.
This linear equation is computational expensive to solve because the left hand side includes the term \(\hat{A}^H \hat{A}\), which corresponds to the outer product of \(\hat{A}^H\) and \(\hat{A}\). A computationally efficient solution is possible, however, by exploiting the Woodbury matrix identity [53]
\[(B + U C V)^{-1} = B^{-1} - B^{-1} U (C^{-1} + V B^{-1} U)^{-1} V B^{-1} \;.\]Setting
\[\begin{split}B &= \hat{D} \\ U &= \hat{A}^H \\ C &= I \\ V &= \hat{A}\end{split}\]we have
\[(\hat{D} + \hat{A}^H \hat{A})^{-1} = \hat{D}^{-1} - \hat{D}^{-1} \hat{A}^H (I + \hat{A} \hat{D}^{-1} \hat{A}^H)^{-1} \hat{A} \hat{D}^{-1}\]which can be simplified to
\[(\hat{D} + \hat{A}^H \hat{A})^{-1} = \hat{D}^{-1} (I - \hat{A}^H \hat{E}^{-1} \hat{A} \hat{D}^{-1})\]by defining \(\hat{E} = I + \hat{A} \hat{D}^{-1} \hat{A}^H\). The right hand side is much cheaper to compute because the only matrix inversions involve \(\hat{D}\), which is diagonal, and \(\hat{E}\), which is a weighted inner product of \(\hat{A}^H\) and \(\hat{A}\).
- Parameters:
A (
ComposedLinearOperator
) – Operator \(A\).D (
CircularConvolve
) – Operator \(D\).