Theory of Optimization: Preliminaries and Basic Properties

4 minute read

Published:

Recently, I find an interesting course taught by Prof. Yin Tat Lee at UW. The course is called `Theory of Optimization and Continuous Algorithms’, and the lecture notes are available under the homepage of this courseuw-cse535-winter19. As a great fan of optimization theory and algorithm design, I think I will follow this course and write a bunch of blogs to record my study of this course.

Most of the materials in this series of blogs will follow the lecture notes of the course, and and interesting optimization book Convex Optimization: Algorithms and Complexity by Sebastien Bubeck.

Since this is the first blog about this course, I will present the preliminaries of the optimization theory, and some basic knowledge about convex optimization, including some basic properties of convex functions.

Preliminaries

First, we will introduce some of the preliminaries about theory of optimization. All of the definitions will be used later.

Functions

First, we will introduce the notion of lipschitz and differentiable

Lipschitz: A function is L-lipschitz if

where the norms and are norms if unspecified.

Differentiable: A function is k-differentiable if its derivative is continuous.

Linear Algebra

The notion of positive semi-definite and the norm of matrices are also important in the world of optimization. We will introduce these notions.

Positive semi-definite: A symmetric matrix is positive semi-defi nite (PSD) if for all . We write if is PSD.

Trace, norm of matrix: For any matrix A, we defi ne

For symmetric matrix , we have and , where are the eigenvalues of matrix .

Calculus

As far as I know, we can get some intuitions of the design and proofs of some optimization algorithm through the Taylor’s expansion.

Taylor’s Remainder Theorem: For any , and any and , there is a such that

Probability

KL-divergence: The KL-divergence of a density with respect to another density is defined as

Ito’s lemma: For any process satisfying , where and , we have that

Introduction–Basic properties of convex functions

In this section, we will introduce some basic properties of convex functions, and we will also give some examples.

Definitions

Here will briefly define the convexity of sets and functions.

Convex set: A set in is convex if for every pair of points , we have .

Convex function: A function is

  1. Convex: if for any , we have .
  2. Concave: if for any , we have .
  3. Logconcave: if is nonnegative and is concave.
  4. Quasiconvex: if the level sets of , defined by are convex for all .

Seperation theorems

Why is convex functions so useful? Maybe convexity is not only general enough to model different kinds of problem, and also simple enough to be solved efficiently. In this section, we will introduce the seperation theorem for convex sets, which allows us to do binary search to find a point in a convex set. This is the basis for polynomial-time algorithms for optimizing general convex functions.

Seperation of a point and a closed convex set: Let be a closed convex set in and . There is a non-zero such that

Proof: Let be a point in closest to , i.e.

(Such a minimizer always exists for closed convex sets, this is sometimes calle Hilbert’s projection theorem). Using convexity of , for any and any , we have that

Expand the LHS, we have

Taking , we have for all .

Taking , we have

From the above theorem, we know that a polytope is as general as a convex set. As a corollary, we have

Intersection of halfspaces: Any closed convex set can be written as the intersection of halfspaces as follows:

In other words, any convex set is a limit of a sequence of polyhedra. Next, we will prove a `seperation theorem’ for convex functions. Based on this theorem, one can design polynomial algorithms for optimizing convex functions.

First order definition: Let be convex. Then for any , we have

Proof: Fix any . Let . Since is convex, so is (any convex function is also convex when restricted to a line), and we have

which implies that

Taking , we have . Applying the chain rule, we have

Actually, the above theorem can be viewed as an equivalent definition of the convex functions. It is easy to show that, when

holds for all (, which is the domain of , is convex), we have is convex.

Then from the previous theorem, we have the first order optimality condition, which is shown as follow

First order condition for unconstraint problems: Let be convex. Then is the minimizer of if and only if .

Proof: If , then

for small enough . Hence, such a point cannot be the minimizer.

On the other hand, if , the previous theorem shows that