Descent: optimization tools in python

descent is a library for performing optimization.

Contents:

Installation

Basic

The easiest way to install is with pip:

$ pip install descent

You can also install from source by grabbing the code from GitHub:

$ git clone https://github.com/nirum/descent.git
$ cd descent
$ pip install -r requirements.txt
$ python setup.py install

Dependencies

Descent works on Python 3.4-3.5. It has only been tested on CPython (not tested on PyPy yet!).

In addition, descent requires the following packages:

  • numpy
  • toolz
  • multipledispatch
  • future

And the following are optional:

  • scipy

Development

Please submit any issues to the GitHub issue tracker.

To contribute to descent, you’ll need to also install sphinx and numpydoc for documentation and nose for testing. We adhere to the NumPy/SciPy documentation standards.

Quickstart

Overview

This document is a work in progress, but for now, check out example code:

Gradient-based algorithms

Each of the gradient based algorithms has the following interface. Given a function f_df that computes the objective and gradient of the function you want to minimize:

>>> opt = descent.GradientDescent(theta_init, f_df, 'sgd', {'lr': learning_rate})
>>> opt.run(maxiter=1000)
>>> plt.plot(opt.theta)

Proximal algorithms

Example code for ADMM, for solving a linear system with a sparsity penalty:

>>> opt = descent.Consensus(theta_init)
>>> opt.add('linsys', A, b)
>>> opt.add('sparse', 0.1)
>>> opt.run()
>>> plt.plot(opt.theta)

Storage

After calling the run command, the history of objective values is stored on the optimizer object:

>>> opt.run(maxiter=1000)
>>> plt.plot(opt.storage['objective'])

Utilities

Some other features that might be of interest:

  • memoization (see: descent.utils.wrap)
  • function wrapping (see: descent.utils.destruct and descent.utils.restruct)
  • gradient checking (see: descent.check_grad)

Tutorial

There is a tutorial consisting of jupyter notebooks demoing the features of descent at: github.com/nirum/descent-tutorial.

Proximal Operators

The proximal_operators module contains functions to compute the following proximal operators:

  • Proximal operator of the nuclear norm (nucnorm)
  • Proximal operator of the l1 norm (sparse)
  • Proximal operator corresponding to solving a linear system of equations (linsys)
  • Proximal operator of the l2 norm, a squared error penalty (squared_error)
  • Proximal operator for minimizing an arbitrary smooth function given an oracle that computes the function value and gradient (lbfgs)
  • Pxorimal operator for the l2 penalty of the discrete different operator, to encourage smoothness (smooth)
  • Projection onto the non-negative orthant (nonneg)
  • Projection onto the semidefinite cone (semidefinite_cone)

API Reference

Main optimization loops

class descent.main.Consensus(tau=(10.0, 2.0, 2.0), tol=(1e-06, 0.001))[source]

Bases: descent.main.Optimizer

add(operator, *args)[source]

Adds a proximal operator to the list of operators

minimize(x0, display=None, maxiter=inf)[source]
descent.main.gradient_optimizer(coro)[source]

Turns a coroutine into a gradient based optimizer.

Gradient algorithms

First order gradient descent algorithms

descent.algorithms.sgd

alias of GradientOptimizer

descent.algorithms.nag

alias of GradientOptimizer

descent.algorithms.rmsprop

alias of GradientOptimizer

descent.algorithms.sag

alias of GradientOptimizer

descent.algorithms.smorms

alias of GradientOptimizer

descent.algorithms.adam

alias of GradientOptimizer

Proximal operators

Proximal operators / mappings

descent.proxops.nucnorm[source]

alias of ProxOp

descent.proxops.sparse[source]

alias of ProxOp

class descent.proxops.linsys(A, b)[source]

Bases: descent.proxops.ProximalOperatorBaseClass

descent.proxops.squared_error[source]

alias of ProxOp

descent.proxops.identity[source]

alias of ProxOp

descent.proxops.lbfgs[source]

alias of ProxOp

descent.proxops.tvd[source]

alias of ProxOp

descent.proxops.smooth[source]

alias of ProxOp

descent.proxops.linear[source]

alias of ProxOp

descent.proxops.fantope[source]

alias of ProxOp

Objectives

Example objectives

descent.objectives.rosenbrock(theta)[source]

Objective and gradient for the rosenbrock function

descent.objectives.sphere(theta)[source]

l2-norm of the parameters

descent.objectives.matyas(theta)[source]

Matyas function

descent.objectives.beale(theta)[source]

Beale’s function

descent.objectives.booth(theta)[source]

Booth’s function

descent.objectives.mccormick(theta)[source]

McCormick function

descent.objectives.camel(theta)[source]

Three-hump camel function

descent.objectives.michalewicz(theta)[source]
descent.objectives.bohachevsky1(theta)[source]

One of the Bohachevsky functions

descent.objectives.zakharov(theta)[source]

Zakharov function

descent.objectives.dixon_price(theta)[source]

Dixon-Price function

Utilities

descent.utils.check_grad(f_df, xref, stepsize=1e-06, tol=1e-06, width=15, style=’round’, out=<_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’UTF-8’>)[source]

Compares the numerical gradient to the analytic gradient

Parameters:
  • f_df (function) – The analytic objective and gradient function to check
  • x0 (array_like) – Parameter values to check the gradient at
  • stepsize (float, optional) – Stepsize for the numerical gradient. Too big and this will poorly estimate the gradient. Too small and you will run into precision issues (default: 1e-6)
  • tol (float, optional) – Tolerance to use when coloring correct/incorrect gradients (default: 1e-5)
  • width (int, optional) – Width of the table columns (default: 15)
  • style (string, optional) – Style of the printed table, see tableprint for a list of styles (default: ‘round’)
descent.utils.destruct(*args, **kwargs)

Deconstructs the input into a 1-D numpy array

descent.utils.restruct(*args, **kwargs)

Reshapes the input into the type of the second argument

descent.utils.wrap(f_df, xref, size=1)[source]

Memoizes an objective + gradient function, and splits it into two functions that return just the objective and gradient, respectively.

Parameters:
  • f_df (function) – Must be unary (takes a single argument)
  • xref (list, dict, or array_like) – The form of the parameters
  • size (int, optional) – Size of the cache (Default=1)

Indices and tables