In this post, we will continue on our discuss of mirror descent. We will present a variant of mirror descent: the lazy mirror descent, also known as Nesterov’s dual averaging.
Lazy Mirror Descent(Nesterov’s Dual Averaging)
In this section, we will provide a more efficient version of mirror descent, named Lazy Mirror Descent. We will show that the convergence rate is the same as that of the original mirror descent(omit constants). The lazy mirror descent changes the update process of from
Moreover, is such that .
The Lazy Mirror Descent algorithm works as follow:
- Initial point , step size .
- For do
- Pick any .
This analysis is adapted from Convex Optimization: Algorithms and Complexity. We have the following theorem.
Let be a mirror map -strongly convex on w.r.t . Let , and be convex and -Lipschitz w.r.t . Then the lazy mirror descent with satisfies
Proof: We define , so that . Since is -strongly convex, we have is -strongly convex.
Then, we can compute
where the second inequality comes from the first order optimality condition for . Next, we observe that
Putting together the two above displays and using Cauchy-Schwarz (with the assumption ) one obtains
This shows that and thus with the above display
Then we proof the following result: for every ,
which would conclude the proof.
The above equation is equivalent to
We prove this by induction,
where the first inequality follows from the induction step and the second follows from the definition of .