Convex Optimization

Convex Optimization

Created
Jan 30, 2022
Description
Convex Optimization
Featured
Category
Mathematics
Tags
notion image
Useful tricks in Convex Optimization
 
 

Convex Optimization Problems

Optimization usually has the form of
minimizef0(x)        subject tofi(x)0,    i=1,,mhi(x)=0,    i=1,,p\begin{aligned}&\text{minimize} & f_0(x)~~~~~~~~ & \\ &\text{subject to} & f_i(x) \le 0, ~~~~& i=1,\ldots,m \\ &&h_i(x) = 0, ~~~~& i =1, \ldots, p\end{aligned}
to describe the optimization function with optimization variable xRnx\in\mathbf{R}^n and the objective function f0:RnRf_0:\mathbf{R}^n\rightarrow\mathbf{R}.
 

Linear optimization programming

minimizef0(x)subject toGxhAx=b\begin{aligned}&\text{minimize} & f_0(x) \\ &\text{subject to} & Gx \preccurlyeq h \\ & & Ax = b\end{aligned}
where GRm×nG \in \mathbf{R}^{m\times n} and ARp×nA \in \mathbf{R}^{p\times n}.
 
The following problems can be formulated as LPs.
Minimize Axb\|Ax-b\|_\infty.
This is equivalent to the LP:
minimizetsubject toAxbt1Axbt1\begin{aligned} &\text{minimize} & t \\ &\text{subject to} & Ax-b \preccurlyeq t\mathbf{1} \\ & & Ax-b \succcurlyeq -t\mathbf{1} \end{aligned}
This follows from that akTxbkt.|a_k^Tx-b_k|\le t. That is, tmaxkakTxbk=Axbt \ge \max_{k} |a_k^Tx - b_k| = \|Ax-b\|_\infty.
Minimize Axb1\|Ax-b\|_1.
This is equivalent to the LP:
minimize1Tssubject toAxbsAxbs\begin{aligned} &\text{minimize} & \mathbf{1}^T s \\ &\text{subject to} & Ax-b \preccurlyeq s \\ & & Ax-b \succcurlyeq -s \end{aligned}
This follows from that akTxbksk|a_k^Tx-b_k|\le s_k. The LP is optimal when sk=akTxbks_k =|a_k^Tx-b_k|. Then, it implies that 1Ts=Axb1\mathbf{1}^Ts = \|Ax-b\|_1 is optimal.
Minimize Axb1\|Ax-b\|_1 subject to x1\|x\|_\infty \le 1.
This is equivalent to the LP:
minimize1Tssubject tosAxbs1x1\begin{aligned} &\text{minimize} & \mathbf{1}^T s \\ &\text{subject to} & -s \preccurlyeq Ax-b \preccurlyeq s \\ & & -\mathbf{1} \preccurlyeq x \preccurlyeq \mathbf{1} \\ \end{aligned}
Minimize x1\|x\|_1 subject to Axb1\|Ax-b\|_\infty \le 1.
This is equivalent to the LP:
minimize1Tysubject to1Axb1yxy\begin{aligned} &\text{minimize} & \mathbf{1}^T y \\ &\text{subject to} & -\mathbf{1} \preccurlyeq Ax-b \preccurlyeq \mathbf{1} \\ & & -y \preccurlyeq x \preccurlyeq y \\ \end{aligned}
Minimize Axb1+x\|Ax-b\|_1 +\|x\|_\infty.
This is equivalent to the LP:
minimize1Ty+tsubject toyAxbyt1xt1\begin{aligned} &\text{minimize} & \mathbf{1}^T y + t \\ &\text{subject to} & -y\preccurlyeq Ax-b \preccurlyeq y\\ & & -t\mathbf{1} \preccurlyeq x \preccurlyeq t\mathbf{1} \\ \end{aligned}
 
 
 

Gradient Descent

for k in range(0, N):
	Approximate f by f_k according to x(k)
	Do something using f_k (e.g., setting x(k+1) = argmmin f_k(x))
There are multiple forms of approximations
First-order approximation: f(y)f(x)+f(x),yxf(y) \approx f(x) + \langle\nabla f(x), y-x\rangle
Usually, the algorithm here looks like
for k in range(0, N):
	if gradient(x(k)) <= eps:
		return x(k)
	x(k+1) = x(k) - step_size * gradient(x(k))
Theorem: Let fC2(Rn)f\in\mathcal{C}^2(\mathbb{R}^n). For any μ0\mu\ge0, the following are equivalent:
  1. f(x)f(y)2Lxy2\|\nabla f(x)-\nabla f(y)\|_2 \le L \|x-y\|_2 for all x,yRnx,y\in\mathbb{R}^n
  1. L2f(x)L-L\preccurlyeq\nabla^2f(x) \preccurlyeq L for all xRnx \in \mathbb{R}^n
  1. f(y)=f(x)+f(x)(yx)±L2yx22f(y) = f(x) + \nabla f(x)^\top (y-x) \pm \frac{L}{2}\|y-x\|_2^2 for all x,yRnx,y\in\mathbb{R}^n
 
Theorem: Let ff be a function with LL-Lipschitz gradient and xx^* be any minimizer of ff. The gradient descent with step size h=1Lh=\frac{1}{L} outputs a point xx such that f(x)2ϵ\|\nabla f(x)\|_2 \le \epsilon in 2Lϵ2(f(x(0))f(x))\frac{2L}{\epsilon^2}\left(f(x^{(0)}) - f(x^*)\right) iterations.
 
Second-order approximation: f(y)f(x)+f(x),yx+(yx)2f(x)(yx)f(y)\approx f(x) + \langle\nabla f(x),y-x\rangle + (y-x)^\top \nabla^2f(x) (y-x)
Stochastic Approximation: ifi(x)fj(x)\sum_if_i(x)\approx f_j(x) for a random jj
Matrix Approximation: Approximate AA by a simpler BB with 12AB2A\frac{1}{2}A \preccurlyeq B \preccurlyeq 2A
Set Approximation: Approximate a convex set by an ellipsoid or a polytope
Barrier Approximation: Approximate a convex set by a smooth function that blows up on the boundary
Polynomial Approximation: Approximate a function by a polynomial
Partial Approximation: Split the problem into two parts and approximate only one part
Taylor Approximation: f(y)k=0KDkf(x)[yx]kf(y)\approx \sum_{k=0}^{K}D^kf(x)[y-x]^k
Mixed 2\ell^2-p\ell^p Approximation: f(y)f(x)+f(x),yx+i=1nαi(yixi)2+βi(yixi)pf(y)\approx f(x) + \langle\nabla f(x), y-x\rangle + \sum_{i=1}^{n}\alpha_i(y_i-x_i)^2 + \beta_i(y_i-x_i)^p
Stochastic Matrix Approximation: Approximate AA by a simpler random BB with B2A B \preccurlyeq 2A and EB12A\mathbf{E} B\succcurlyeq \frac{1}{2}A
Homotopy Method: Approximate a function by a family of functions