Machine Learning Note: Duality
17 November 2019
1. Lower bound property
Given an optimization problem with the following standard form:
\[\begin{align} \min_{x}\ \ &f(x) \\ s.t.\ \ & g_i(x) \le 0, i=1...m \\ & h_i(x) = 0, i=1...p \end{align}\]Based on the Generalized Lagrangian (original Lagrangian only allows equality constraint), the original optimization problem with constraints can be convert to the following optimization problem without constraints:
\[\mathcal{L}(x, \alpha, \beta) = f(x) + \sum^k_{i=1}\alpha_i g_i(x) + \sum^l_{i=1}\beta_i h_i(x)\]This Lagrangian can be solved by (if there is a feasible solution and it cannot be $\infty$):
\[\min_{w} \max_{\alpha, \beta; \alpha \ge 0} \mathcal{L}(w, \alpha, \beta)\]Its Lagrangian Dual Function is:
\[\begin{align} g(\alpha, \beta) &= \inf_x \mathcal{L}(w, \alpha, \beta) \\ &= \inf_x [f(x) + \sum^k_{i=1}\alpha_i g_i(x) + \sum^l_{i=1}\beta_i h_i(x)] \end{align}\]where $\alpha, \beta$ is called dual variables.
Lower Bound Property. The dual problem is always the lower bound of the primal problem:
\[g(\alpha, \beta) \le p^*, \forall \alpha, \beta\]2. Strong and Weak Duality
Let $p^*$ be the primal optimal, and $d^*$ be the dual optimal. According to the Lower Bound Property, we have
\[p^* \ge d^*\]Which we also called it weak duality.
A problem has strong duality when its primal optimal equals to its dual optimal, i.e. $p^* = d^*$.
- In most cases, the strong duality does not hold.
- For convex functions, the strong duality usualy holds.
- For non-convex functions, the strong duality sometimes
Slater’s conditions.: For a convex optimization problem with the form:
\[\begin{align} \min_{x}\ \ &f(x) \\ s.t.\ \ & g_i(x) \le 0, i=1...m \\ & h_i(x) = 0, i=1...p \end{align}\]Strong duality holds if $\exists x, g_i(x) \lt 0, h_i(x) =0$.
The above condition is also called the strictly feasible, i.e. there exist a feasible solution that satisfies $g_i(x) \lt 0 $ for the original constraint $g_i(x) \le 0$.
The Slater’s condition is only suitable for convex function.
3. Complementary Slackness
Assume that strong duality holds for $x^*, \alpha^*, \beta^*$, where $x^*$ is the primal optimal, $\alpha^*, \beta^*$ is the dual optimal. By definition we have:
\[\begin{align} f(x^*) &=g(\alpha^*, \beta^*) \\ &= \inf_x [f(x) + \sum^k_{i=1}\alpha_i^* g_i(x) + \sum^l_{i=1}\beta_i^* h_i(x)] \\ & \le f(x^*) + \sum^k_{i=1}\alpha_i^* g_i(x^*) + \sum^l_{i=1}\beta_i^* h_i(x^*) \\ & \le f(x^*) \end{align}\]When strong duality holds, we need $\alpha_i^* g_i(x^*)=0$, i.e.
\[\begin{cases} \alpha_i^*=0, \ \ f_i(x^*)<0 \\ \alpha_i^*\gt0, \ \ f_i(x^*)=0 \end{cases}\]The above condition is called Complementary Slackness
If $g_i(x^*)=0$, we say this constraint $g_i(x)$ is active. An intuitive explaination of the Complementary Slackness is that: if the solution is on the boundary imposed by the inequality, then we must use its KKT multiplier to influence the solution to $x$; if the ineuqality has no influence on the solution, then we represent this by zeroing out its KKT multiplier.
4. KKT Conditions
KKT(Karush–Kuhn–Tucker) condition describes a necessary condition for strong duality (not sufficient).
Assume that $f(x), g_i(x), h_i(x)$ are differentiable, but we do not assume them to be convex function. The KKT condition is:
\[\frac{\partial}{\partial x}\mathcal{L}(x^*, \alpha^*, \beta^*) = 0\\ g_i(x^*) \le 0, i=1...m \\ h_i(x^*) = 0, i=1...p \\ \alpha_i^*\ge0, i=1...m \\ \alpha_i^*g_i(x)=0, i=1...m \\\]Which can be summarized as:
- The gradient of the generalized Lagrangian is 0. (Because strong duality implies that the minumum value of $\mathcal{L}(x^*, \alpha^*, \beta^*)$ is at $x^*$)
- All constraints on both $x$ and the KKT multiplier are satisfied. (e.q. 2 and 3 in the above condition).
- The inequality constraints exhibit “complementary slackness”. (e.q. 4 and 5 in the above condition).
5. KKT condition for convex problem
Assume that the original problem is convex ($f$ and $g_i$ are convex, $h_i$ are affine functions), KKT condition becomes sufficient and necessary, i.e. the solutions satisfying KKT condition are also the optimal solution of the primal and dual.
6. Conclusion
- Dual problem is the infimum of the primal’s optimal solution.
- Weak Duality always exists.
- Strong duality needs satisfy KKT condition. (necessary condition)
- For convex problem, KKT condition is sufficient for strong duality.