# Theory of Optimization: Frank-Wolfe Algorithm

** Published:**

In this post, we describe a new geometry dependent algorithm that relies on different set of assumptions. The algorithm is called conditional gradient descent, aka Frank-Wolfe.

## Frank-Wolfe

### Algorithm

Frank-Wolfe algorithm solves the following convex optimization problem

for such that is `Lipschitz’ in a certain sense. The algorithm reduces the problem into a sequence of linear optimization problems.

Frank-Wolfe algorithm has the following procedure:

- Initial point , step size .
**For****do**- Compute .
- with .

### Analysis

We have the following theorem for Frank-Wolfe algorithm.

Given a convex function on a convex set and a constant such that

for any and , we have

*Proof:* By the definition of , we have

Note that from the convexity of and the definition of , we have

Hence, we have

Let , we have

Note that , we can prove the theorem by induction.

⬜

Note that if is L-Lipschitz with respect to , over the domain , then .

### Sparsity Analysis

For the domain is a simplex, the step is always a vertex of the simplex, and hence, each step of Frank-Wolfe increases the sparsity by at most . We can think that the convergence result of Frank Wolfe proves that there is an approximate sparse solution of the optimization problem.

Given a polytope lies inside an unit ball. For any , there are vertices of such that

for some .

*Proof:* Run Frank-Wolfe algorithm with . Note that is 1-Lipschitz with respect to and that the diameter of is bounded by . Hence, we have .

Therefore, Frank-Wolfe algorithm shows that

Since , we have , and this gives the result.

⬜