to describe the optimization function with optimization variable x∈Rnand the objective function f0:Rn→R.
Linear optimization programming
minimizesubject tof0(x)Gx≼hAx=b
where G∈Rm×n and A∈Rp×n.
The following problems can be formulated as LPs.
Minimize ∥Ax−b∥∞.
This is equivalent to the LP:
minimizesubject totAx−b≼t1Ax−b≽−t1
This follows from that ∣akTx−bk∣≤t. That is, t≥maxk∣akTx−bk∣=∥Ax−b∥∞.
Minimize ∥Ax−b∥1.
This is equivalent to the LP:
minimizesubject to1TsAx−b≼sAx−b≽−s
This follows from that ∣akTx−bk∣≤sk. The LP is optimal when sk=∣akTx−bk∣. Then, it implies that 1Ts=∥Ax−b∥1 is optimal.
Minimize ∥Ax−b∥1 subject to ∥x∥∞≤1.
This is equivalent to the LP:
minimizesubject to1Ts−s≼Ax−b≼s−1≼x≼1
Minimize ∥x∥1 subject to ∥Ax−b∥∞≤1.
This is equivalent to the LP:
minimizesubject to1Ty−1≼Ax−b≼1−y≼x≼y
Minimize ∥Ax−b∥1+∥x∥∞.
This is equivalent to the LP:
minimizesubject to1Ty+t−y≼Ax−b≼y−t1≼x≼t1
Gradient Descent
for k in range(0, N):
Approximate f by f_k according to x(k)
Do something using f_k (e.g., setting x(k+1) = argmmin f_k(x))
There are multiple forms of approximations
First-order approximation: f(y)≈f(x)+⟨∇f(x),y−x⟩
Usually, the algorithm here looks like
for k in range(0, N):
if gradient(x(k)) <= eps:
return x(k)
x(k+1) = x(k) - step_size * gradient(x(k))
Theorem: Let f∈C2(Rn). For any μ≥0, the following are equivalent:
∥∇f(x)−∇f(y)∥2≤L∥x−y∥2 for all x,y∈Rn
−L≼∇2f(x)≼L for all x∈Rn
f(y)=f(x)+∇f(x)⊤(y−x)±2L∥y−x∥22 for all x,y∈Rn
Theorem: Let f be a function with L-Lipschitz gradient and x∗ be any minimizer of f. The gradient descent with step size h=L1 outputs a point x such that ∥∇f(x)∥2≤ϵ in ϵ22L(f(x(0))−f(x∗)) iterations.
Second-order approximation: f(y)≈f(x)+⟨∇f(x),y−x⟩+(y−x)⊤∇2f(x)(y−x)Stochastic Approximation: ∑ifi(x)≈fj(x) for a random jMatrix Approximation: Approximate A by a simpler B with 21A≼B≼2ASet Approximation: Approximate a convex set by an ellipsoid or a polytopeBarrier Approximation: Approximate a convex set by a smooth function that blows up on the boundaryPolynomial Approximation: Approximate a function by a polynomialPartial Approximation: Split the problem into two parts and approximate only one partTaylor Approximation: f(y)≈∑k=0KDkf(x)[y−x]kMixed ℓ2-ℓp Approximation: f(y)≈f(x)+⟨∇f(x),y−x⟩+∑i=1nαi(yi−xi)2+βi(yi−xi)pStochastic Matrix Approximation: Approximate A by a simpler random B with B≼2A and EB≽21AHomotopy Method: Approximate a function by a family of functions