In this post, we describe a new geometry dependent algorithm that relies on different set of assumptions. The algorithm is called conditional gradient descent, aka Frank-Wolfe.
Frank-Wolfe algorithm solves the following convex optimization problem
for such that is `Lipschitz’ in a certain sense. The algorithm reduces the problem into a sequence of linear optimization problems.
Frank-Wolfe algorithm has the following procedure:
- Initial point , step size .
- For do
- Compute .
- with .
We have the following theorem for Frank-Wolfe algorithm.
Given a convex function on a convex set and a constant such that
for any and , we have
Proof: By the definition of , we have
Note that from the convexity of and the definition of , we have
Hence, we have
Let , we have
Note that , we can prove the theorem by induction.
Note that if is L-Lipschitz with respect to , over the domain , then .
For the domain is a simplex, the step is always a vertex of the simplex, and hence, each step of Frank-Wolfe increases the sparsity by at most . We can think that the convergence result of Frank Wolfe proves that there is an approximate sparse solution of the optimization problem.
Given a polytope lies inside an unit ball. For any , there are vertices of such that
for some .
Proof: Run Frank-Wolfe algorithm with . Note that is 1-Lipschitz with respect to and that the diameter of is bounded by . Hence, we have .
Therefore, Frank-Wolfe algorithm shows that
Since , we have , and this gives the result.