scico.solver#

Solver and optimization algorithms.

This module provides a number of functions for solving linear systems and optimization problems, some of which are used as subproblem solvers within the iterations of the proximal algorithms in the scico.optimize subpackage.

This module also provides scico interface wrappers for functions from scipy.optimize since jax directly implements only a very limited subset of these functions (there is limited, experimental support for L-BFGS-B), but only CG and BFGS are fully supported. These wrappers are required because the functions in scipy.optimize only support on 1D, real valued, numpy arrays. These limitations are addressed by:

  • Enabling the use of multi-dimensional arrays by flattening and reshaping within the wrapper.

  • Enabling the use of jax arrays by automatically converting to and from numpy arrays.

  • Enabling the use of complex arrays by splitting them into real and imaginary parts.

The wrapper also JIT compiles the function and gradient evaluations.

These wrapper functions have a number of advantages and disadvantages with respect to those in jax.scipy.optimize:

  • This module provides many more algorithms than jax.scipy.optimize.

  • The functions in this module tend to be faster for small-scale problems (presumably due to some overhead in the jax functions).

  • The functions in this module are slower for large problems due to the frequent host-device copies corresponding to conversion between numpy arrays and jax arrays.

  • The solvers in this module can’t be JIT compiled, and gradients cannot be taken through them.

In the future, these wrapper functions may be replaced with a dependency on JAXopt.

Functions

bisect(f, a, b[, args, xtol, ftol, maxiter, ...])

Vectorised root finding via bisection method.

cg(A, b[, x0, tol, atol, maxiter, info, M])

Conjugate Gradient solver.

golden(f, a, b[, c, args, xtol, maxiter, ...])

Vectorised scalar minimization via golden section method.

lstsq(A, b[, x0, tol, atol, maxiter, info, M])

Least squares solver.

minimize(func, x0[, args, method, hess, ...])

Minimization of scalar function of one or more variables.

minimize_scalar(func[, bracket, bounds, ...])

Minimization of scalar function of one variable.

Classes

ConvATADSolver(A, D)

Solver for a linear system involving a sum of convolutions.

MatrixATADSolver(A, D[, W, cho_factor, ...])

Solver for linear system involving a symmetric product.

scico.solver.minimize(func, x0, args=(), method='L-BFGS-B', hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)[source]#

Minimization of scalar function of one or more variables.

Wrapper around scipy.optimize.minimize. This function differs from scipy.optimize.minimize in three ways: :rtype: OptimizeResult

  • The jac options of scipy.optimize.minimize are not supported. The gradient is calculated using jax.grad.

  • Functions mapping from N-dimensional arrays -> float are supported.

  • Functions mapping from complex arrays -> float are supported.

For more detail, including descriptions of the optimization methods and custom minimizers, refer to the original docs for scipy.optimize.minimize.

scico.solver.minimize_scalar(func, bracket=None, bounds=None, args=(), method='brent', tol=None, options=None)[source]#

Minimization of scalar function of one variable.

Wrapper around scipy.optimize.minimize_scalar.

For more detail, including descriptions of the optimization methods and custom minimizers, refer to the original docstring for scipy.optimize.minimize_scalar.

Return type:

OptimizeResult

scico.solver.cg(A, b, x0=None, *, tol=1e-05, atol=0.0, maxiter=1000, info=True, M=None)[source]#

Conjugate Gradient solver.

Solve the linear system \(A\mb{x} = \mb{b}\), where \(A\) is positive definite, via the conjugate gradient method.

Parameters:
  • A (Callable) – Callable implementing linear operator \(A\), which should be positive definite.

  • b (Array) – Input array \(\mb{b}\).

  • x0 (Optional[Array]) – Initial solution. If A is a LinearOperator, this parameter need not be specified, and defaults to a zero array. Otherwise, it is required.

  • tol (float) – Relative residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).

  • atol (float) – Absolute residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).

  • maxiter (int) – Maximum iterations. Default: 1000.

  • info (bool) – If True return a tuple consting of the solution array and a dictionary containing diagnostic information, otherwise just return the solution.

  • M (Optional[Callable]) – Preconditioner for A. The preconditioner should approximate the inverse of A. The default, None, uses no preconditioner.

Return type:

Tuple[Array, dict]

Returns:

tuple

A tuple (x, info) containing:

  • x : Solution array.

  • info: Dictionary containing diagnostic information.

scico.solver.lstsq(A, b, x0=None, tol=1e-05, atol=0.0, maxiter=1000, info=False, M=None)[source]#

Least squares solver.

Solve the least squares problem

\[\argmin_{\mb{x}} \; (1/2) \norm{ A \mb{x} - \mb{b}) }_2^2 \;,\]

where \(A\) is a linear operator and \(\mb{b}\) is a vector. The problem is solved using cg.

Parameters:
  • A (Callable) – Callable implementing linear operator \(A\).

  • b (Array) – Input array \(\mb{b}\).

  • x0 (Optional[Array]) – Initial solution. If A is a LinearOperator, this parameter need not be specified, and defaults to a zero array. Otherwise, it is required.

  • tol (float) – Relative residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).

  • atol (float) – Absolute residual stopping tolerance. Convergence occurs when norm(residual) <= max(tol * norm(b), atol).

  • maxiter (int) – Maximum iterations. Default: 1000.

  • info (bool) – If True return a tuple consting of the solution array and a dictionary containing diagnostic information, otherwise just return the solution.

  • M (Optional[Callable]) – Preconditioner for A. The preconditioner should approximate the inverse of A. The default, None, uses no preconditioner.

Return type:

Tuple[Array, dict]

Returns:

tuple

A tuple (x, info) containing:

  • x : Solution array.

  • info: Dictionary containing diagnostic information.

scico.solver.bisect(f, a, b, args=(), xtol=1e-07, ftol=1e-07, maxiter=100, full_output=False, range_check=True)[source]#

Vectorised root finding via bisection method.

Vectorised root finding via bisection method, supporting simultaneous finding of multiple roots on a function defined over a multi-dimensional array. When the function is array-valued, each of these values is treated as the independent application of a scalar function. The initial interval [a, b] must bracket the root for all scalar functions.

The interface is similar to that of scipy.optimize.bisect, which is much faster when f is a scalar function and a and b are scalars.

Parameters:
  • f (Callable) – Function returning a float or an array of floats.

  • a (Array) – Lower bound of interval on which to apply bisection.

  • b (Array) – Upper bound of interval on which to apply bisection.

  • args (Tuple) – Additional arguments for function f.

  • xtol (float) – Stopping tolerance based on maximum bisection interval length over array.

  • ftol (float) – Stopping tolerance based on maximum absolute function value over array.

  • maxiter (int) – Maximum number of algorithm iterations.

  • full_output (bool) – If False, return just the root, otherwise return a tuple (x, info) where x is the root and info is a dict containing algorithm status information.

  • range_check (bool) – If True, check to ensure that the initial [a, b] range brackets the root of f.

Return type:

Union[Array, dict]

Returns:

tuple

A tuple (x, info) containing:

  • x : Root array.

  • info: Dictionary containing diagnostic information.

scico.solver.golden(f, a, b, c=None, args=(), xtol=1e-07, maxiter=100, full_output=False)[source]#

Vectorised scalar minimization via golden section method.

Vectorised scalar minimization via golden section method, supporting simultaneous minimization of a function defined over a multi-dimensional array. When the function is array-valued, each of these values is treated as the independent application of a scalar function. The minimizer must lie within the interval (a, b) for all scalar functions, and, if specified c must be within that interval.

The interface is more similar to that of bisect than that of scipy.optimize.golden which is much faster when f is a scalar function and a, b, and c are scalars.

Parameters:
  • f (Callable) – Function returning a float or an array of floats.

  • a (Array) – Lower bound of interval on which to search.

  • b (Array) – Upper bound of interval on which to search.

  • c (Optional[Array]) – Initial value for first search point interior to bounding interval (a, b)

  • args (Tuple) – Additional arguments for function f.

  • xtol (float) – Stopping tolerance based on maximum search interval length over array.

  • maxiter (int) – Maximum number of algorithm iterations.

  • full_output (bool) – If False, return just the minizer, otherwise return a tuple (x, info) where x is the minimizer and info is a dict containing algorithm status information.

Return type:

Union[Array, dict]

Returns:

tuple

A tuple (x, info) containing:

  • x : Minimizer array.

  • info: Dictionary containing diagnostic information.

class scico.solver.MatrixATADSolver(A, D, W=None, cho_factor=False, lower=False, check_finite=True)[source]#

Bases: object

Solver for linear system involving a symmetric product.

Solve a linear system of the form

\[(A^T W A + D) \mb{x} = \mb{b}\]

or

\[(A^T W A + D) X = B \;,\]

where \(A \in \mbb{R}^{M \times N}\), \(W \in \mbb{R}^{M \times M}\) and \(D \in \mbb{R}^{N \times N}\). \(A\) must be an instance of MatrixOperator or an array; \(D\) must be an instance of MatrixOperator, Diagonal, or an array, and \(W\), if specified, must be an instance of Diagonal or an array.

The solution is computed by factorization of matrix \(A^T W A + D\) and solution via Gaussian elimination. If \(D\) is diagonal and \(N < M\) (i.e. \(A W A^T\) is smaller than \(A^T W A\)), then \(A W A^T + D\) is factorized and the original problem is solved via the Woodbury matrix identity

\[(E + U C V)^{-1} = E^{-1} - E^{-1} U (C^{-1} + V E^{-1} U)^{-1} V E^{-1} \;.\]

Setting

\[\begin{split}E &= D \\ U &= A^T \\ C &= W \\ V &= A\end{split}\]

we have

\[(D + A^T W A)^{-1} = D^{-1} - D^{-1} A^T (W^{-1} + A D^{-1} A^T)^{-1} A D^{-1}\]

which can be simplified to

\[(D + A^T W A)^{-1} = D^{-1} (I - A^T G^{-1} A D^{-1})\]

by defining \(G = W^{-1} + A D^{-1} A^T\). We therefore have that

\[\mb{x} = (D + A^T W A)^{-1} \mb{b} = D^{-1} (I - A^T G^{-1} A D^{-1}) \mb{b} \;.\]

If we have a Cholesky factorization of \(G\), e.g. \(G = L L^T\), we can define

\[\mb{w} = G^{-1} A D^{-1} \mb{b}\]

so that

\[\begin{split}G \mb{w} &= A D^{-1} \mb{b} \\ L L^T \mb{w} &= A D^{-1} \mb{b} \;.\end{split}\]

The Cholesky factorization can be exploited by solving for \(\mb{z}\) in

\[L \mb{z} = A D^{-1} \mb{b}\]

and then for \(\mb{w}\) in

\[L^T \mb{w} = \mb{z} \;,\]

so that

\[\mb{x} = D^{-1} \mb{b} - D^{-1} A^T \mb{w} \;.\]

(Functions cho_solve and lu_solve allow direct solution for \(\mb{w}\) without the two-step procedure described here.) A Cholesky factorization should only be used when \(G\) is positive-definite (e.g. \(D\) is diagonal and positive); if not, an LU factorization should be used.

Complex-valued problems are also supported, in which case the transpose \(\cdot^T\) in the equations above should be taken to represent the conjugate transpose.

To solve problems directly involving a matrix of the form \(A W A^T + D\), initialize with A.T (or A.T.conj() for complex problems) instead of A.

Parameters:
  • A (Union[MatrixOperator, Array]) – Matrix \(A\).

  • D (Union[MatrixOperator, Diagonal, Array]) – Matrix \(D\). If a 2D array or MatrixOperator, specifies the 2D matrix \(D\). If 1D array or Diagonal, specifies the diagonal elements of \(D\).

  • W (Union[Diagonal, Array, None]) – Matrix \(W\). Specifies the diagonal elements of \(W\). Defaults to an array with unit entries.

  • cho_factor (bool) – Flag indicating whether to use Cholesky (True) or LU (False) factorization.

  • lower (bool) – Flag indicating whether lower (True) or upper (False) triangular factorization should be computed. Only relevant to Cholesky factorization.

  • check_finite (bool) – Flag indicating whether the input array should be checked for Inf and NaN values.

solve(b, check_finite=None)[source]#

Solve the linear system.

Solve the linear system with right hand side \(\mb{b}\) (b is a vector) or \(B\) (b is a 2d array).

Parameters:
  • b (Array) – Vector \(\mathbf{b}\) or matrix \(B\).

  • check_finite (Optional[bool]) – Flag indicating whether the input array should be checked for Inf and NaN values. If None, use the value selected on initialization.

Return type:

Array

Returns:

Solution to the linear system.

accuracy(x, b)[source]#

Compute solution relative residual.

Parameters:
  • x (Array) – Array \(\mathbf{x}\) (solution).

  • b (Array) – Array \(\mathbf{b}\) (right hand side of linear system).

Return type:

float

Returns:

Relative residual of solution.

class scico.solver.ConvATADSolver(A, D)[source]#

Bases: object

Solver for a linear system involving a sum of convolutions.

Solve a linear system of the form

\[(A^H A + D) \mb{x} = \mb{b}\]

where \(A\) is a block-row operator with circulant blocks, i.e. it can be written as

\[A = \left( \begin{array}{cccc} A_1 & A_2 & \ldots & A_{K} \end{array} \right) \;,\]

where all of the \(A_k\) are circular convolution operators, and \(D\) is a circular convolution operator. This problem is most easily solved in the DFT transform domain, where the circular convolutions become diagonal operators. Denoting the frequency-domain versions of variables with a circumflex (e.g. \(\hat{\mb{x}}\) is the frequency-domain version of \(\mb{x}\)), the the problem can be written as

\[(\hat{A}^H \hat{A} + \hat{D}) \hat{\mb{x}} = \hat{\mb{b}} \;,\]

where

\[\hat{A} = \left( \begin{array}{cccc} \hat{A}_1 & \hat{A}_2 & \ldots & \hat{A}_{K} \end{array} \right) \;,\]

and \(\hat{D}\) and all the \(\hat{A}_k\) are diagonal operators.

This linear equation is computational expensive to solve because the left hand side includes the term \(\hat{A}^H \hat{A}\), which corresponds to the outer product of \(\hat{A}^H\) and \(\hat{A}\). A computationally efficient solution is possible, however, by exploiting the Woodbury matrix identity [53]

\[(B + U C V)^{-1} = B^{-1} - B^{-1} U (C^{-1} + V B^{-1} U)^{-1} V B^{-1} \;.\]

Setting

\[\begin{split}B &= \hat{D} \\ U &= \hat{A}^H \\ C &= I \\ V &= \hat{A}\end{split}\]

we have

\[(\hat{D} + \hat{A}^H \hat{A})^{-1} = \hat{D}^{-1} - \hat{D}^{-1} \hat{A}^H (I + \hat{A} \hat{D}^{-1} \hat{A}^H)^{-1} \hat{A} \hat{D}^{-1}\]

which can be simplified to

\[(\hat{D} + \hat{A}^H \hat{A})^{-1} = \hat{D}^{-1} (I - \hat{A}^H \hat{E}^{-1} \hat{A} \hat{D}^{-1})\]

by defining \(\hat{E} = I + \hat{A} \hat{D}^{-1} \hat{A}^H\). The right hand side is much cheaper to compute because the only matrix inversions involve \(\hat{D}\), which is diagonal, and \(\hat{E}\), which is a weighted inner product of \(\hat{A}^H\) and \(\hat{A}\).

Parameters:
solve(b)[source]#

Solve the linear system.

Solve the linear system with right hand side \(\mb{b}\).

Parameters:

b (Array) – Array \(\mathbf{b}\).

Return type:

Array

Returns:

Solution to the linear system.

accuracy(x, b)[source]#

Compute solution relative residual.

Parameters:
  • x (Array) – Array \(\mathbf{x}\) (solution).

  • b (Array) – Array \(\mathbf{b}\) (right hand side of linear system).

Return type:

float

Returns:

Relative residual of solution.