Recently, I find an interesting course taught by Prof. Yin Tat Lee at UW. The course is called `Theory of Optimization and Continuous Algorithms’, and the lecture notes are available under the homepage of this courseuw-cse535-winter19. As a great fan of optimization theory and algorithm design, I think I will follow this course and write a bunch of blogs to record my study of this course.
Most of the materials in this series of blogs will follow the lecture notes of the course, and and interesting optimization book Convex Optimization: Algorithms and Complexity by Sebastien Bubeck.
Since this is the first blog about this course, I will present the preliminaries of the optimization theory, and some basic knowledge about convex optimization, including some basic properties of convex functions.
First, we will introduce some of the preliminaries about theory of optimization. All of the definitions will be used later.
First, we will introduce the notion of lipschitz and differentiable
Lipschitz: A function is L-lipschitz if
where the norms and are norms if unspecified.
Differentiable: A function is k-differentiable if its derivative is continuous.
The notion of positive semi-definite and the norm of matrices are also important in the world of optimization. We will introduce these notions.
Positive semi-definite: A symmetric matrix is positive semi-defi nite (PSD) if for all . We write if is PSD.
Trace, norm of matrix: For any matrix A, we defi ne
For symmetric matrix , we have and , where are the eigenvalues of matrix .
As far as I know, we can get some intuitions of the design and proofs of some optimization algorithm through the Taylor’s expansion.
Taylor’s Remainder Theorem: For any , and any and , there is a such that
KL-divergence: The KL-divergence of a density with respect to another density is defined as
Ito’s lemma: For any process satisfying , where and , we have that
Introduction–Basic properties of convex functions
In this section, we will introduce some basic properties of convex functions, and we will also give some examples.
Here will briefly define the convexity of sets and functions.
Convex set: A set in is convex if for every pair of points , we have .
Convex function: A function is
- Convex: if for any , we have .
- Concave: if for any , we have .
- Logconcave: if is nonnegative and is concave.
- Quasiconvex: if the level sets of , defined by are convex for all .
Why is convex functions so useful? Maybe convexity is not only general enough to model different kinds of problem, and also simple enough to be solved efficiently. In this section, we will introduce the seperation theorem for convex sets, which allows us to do binary search to find a point in a convex set. This is the basis for polynomial-time algorithms for optimizing general convex functions.
Seperation of a point and a closed convex set: Let be a closed convex set in and . There is a non-zero such that
Proof: Let be a point in closest to , i.e.
(Such a minimizer always exists for closed convex sets, this is sometimes calle Hilbert’s projection theorem). Using convexity of , for any and any , we have that
Expand the LHS, we have
Taking , we have for all .
Taking , we have
From the above theorem, we know that a polytope is as general as a convex set. As a corollary, we have
Intersection of halfspaces: Any closed convex set can be written as the intersection of halfspaces as follows:
In other words, any convex set is a limit of a sequence of polyhedra. Next, we will prove a `seperation theorem’ for convex functions. Based on this theorem, one can design polynomial algorithms for optimizing convex functions.
First order definition: Let be convex. Then for any , we have
Proof: Fix any . Let . Since is convex, so is (any convex function is also convex when restricted to a line), and we have
which implies that
Taking , we have . Applying the chain rule, we have
Actually, the above theorem can be viewed as an equivalent definition of the convex functions. It is easy to show that, when
holds for all (, which is the domain of , is convex), we have is convex.
Then from the previous theorem, we have the first order optimality condition, which is shown as follow
First order condition for unconstraint problems: Let be convex. Then is the minimizer of if and only if .
Proof: If , then
for small enough . Hence, such a point cannot be the minimizer.
On the other hand, if , the previous theorem shows that