Theory of Optimization: Frank-Wolfe Algorithm

1 minute read


In this post, we describe a new geometry dependent algorithm that relies on different set of assumptions. The algorithm is called conditional gradient descent, aka Frank-Wolfe.



Frank-Wolfe algorithm solves the following convex optimization problem

for such that is `Lipschitz’ in a certain sense. The algorithm reduces the problem into a sequence of linear optimization problems.

Frank-Wolfe algorithm has the following procedure:

  1. Initial point , step size .
  2. For do
  3. Compute .
  4. with .


We have the following theorem for Frank-Wolfe algorithm.

Given a convex function on a convex set and a constant such that

for any and , we have

Proof: By the definition of , we have

Note that from the convexity of and the definition of , we have

Hence, we have

Let , we have

Note that , we can prove the theorem by induction.

Note that if is L-Lipschitz with respect to , over the domain , then .

Sparsity Analysis

For the domain is a simplex, the step is always a vertex of the simplex, and hence, each step of Frank-Wolfe increases the sparsity by at most . We can think that the convergence result of Frank Wolfe proves that there is an approximate sparse solution of the optimization problem.

Given a polytope lies inside an unit ball. For any , there are vertices of such that

for some .

Proof: Run Frank-Wolfe algorithm with . Note that is 1-Lipschitz with respect to and that the diameter of is bounded by . Hence, we have .

Therefore, Frank-Wolfe algorithm shows that

Since , we have , and this gives the result.