In this post, we will introduce the Mirror Descent algorithm that solves the convex optimization algorithm.
Intuition and Algorithm
In the previous analysis for projected gradient descent, we can find that the PGD works for arbitrary Hilbert space . Suppose that now we want to optimize functions in Banach space , and the gradient descent does not make sence. In fact, lies in space and the gradient lies in the dual space . There is no problem in the Hilbert space , since by Riesz representation theorem is isometric to .
The great insight of Nemirovski and Yudin(creator of Mirror Descent) is that one can still do a gradient descent by first mapping(by a function called mirror map) the point into the dual space , performing the gradient update in dual space, and finally mapping back to and projecting to the constraint set . We will first introduce the mirror map and then discuss the projection step.
Mirror map: Let be a convex open set such that is included in its closure, that is and . We say that is a mirror map if it satisfies the following properties:
- is strictly convex and differentiable.
- The gradient of takes all possible values, that is .
- The gradient of diverges on the boundary of .
The projection operator is designed by the following Bergman divergence.
Bregman divergence: For any strictly convex function , we de fine the Bregman divergence as
And we define the projection of as
The Mirror Descent algorithm works as follow:
- Initial point such that , step size .
- For do
- Pick any .
Below is the illustration for the mirror descent(Picture from Convex Opitmization:Algorithms and Complexity)
Actually, one can also rewrite mirror descent as follows:
This expression is often taken as the definition of mirror descent. It gives a proximal point of view on mirror descent.
Analysis of Mirror Descent
We first have the following property:
Proof: The proof will be completed by trivial computation.
Then we have the following Pythagorean Theorem.
Pythagorean Theorem: Let and , then
which also implies
Proof: First, it is easy to show that
Then it is equivalent to show that
Let , then is the minimizer of in set . Then from the optimality of convex functions, we have
which is just the inequality
Then, we have the following theorem:
Let be a mirror map -strongly convex on w.r.t . Let , and be convex and -Lipschitz w.r.t . Then the mirror descent with satisfies
Proof: Let . The claimed bound will be obtained by taking a limit . By the convexity of , the definition of mirror descent, and the previous 2 lemmas, we have
The term will lead to a telescopic sum, and it remains to bound the other term. By the -strongly convexity of the mirror map and , we have
Then we have
which concludes the proof up to trivial computation.
Standard Setups for Mirror Descent
Ball setup: Taking on , the function is a mirror map and strongly convex w.r.t , and furthermore, the associated Bregman divergence is given by
In this case, the mirror descent is exactly equivalent to projected subgradient descent.
Simplex setup: Now we focus on case where and the simplex .
Step formula: From computation, we can know that
for some normalization constant . Note that this update formula is just the multiplicative weight updating.
Strong convexity: From computation, we can know that is 1-strongly convex w.r.t 1-norm, i.e.
Hence, we have .
Diameter: Direct computation will show that . Hence, .
Final result: We have the following theorem.
Let be 1-Lipschitz functions on . Then, the mirror descent with the mirror map discussed above outputs in iterations such that
In comparison, projected gradient descent gives , which is much larger.
Related to Online Convex Optimization
Note that the proofs of projected subgradient descent in the previous post and the mirror descent in this post do not require that we are solving the same convex function . Based on this observation, we can easily extend the previous results to the online setting. For the online mirror descent(online subgradient descent is a special case of online mirror descent), we have
which shows that the regret is .