Machine Learning Note: Duality

17 November 2019

1. Lower bound property

Given an optimization problem with the following standard form:

\[\begin{align} \min_{x}\ \ &f(x) \\ s.t.\ \ & g_i(x) \le 0, i=1...m \\ & h_i(x) = 0, i=1...p \end{align}\]

Based on the Generalized Lagrangian (original Lagrangian only allows equality constraint), the original optimization problem with constraints can be convert to the following optimization problem without constraints:

\[\mathcal{L}(x, \alpha, \beta) = f(x) + \sum^k_{i=1}\alpha_i g_i(x) + \sum^l_{i=1}\beta_i h_i(x)\]

This Lagrangian can be solved by (if there is a feasible solution and it cannot be $\infty$):

\[\min_{w} \max_{\alpha, \beta; \alpha \ge 0} \mathcal{L}(w, \alpha, \beta)\]

Its Lagrangian Dual Function is:

\[\begin{align} g(\alpha, \beta) &= \inf_x \mathcal{L}(w, \alpha, \beta) \\ &= \inf_x [f(x) + \sum^k_{i=1}\alpha_i g_i(x) + \sum^l_{i=1}\beta_i h_i(x)] \end{align}\]

where $\alpha, \beta$ is called dual variables.

Lower Bound Property. The dual problem is always the lower bound of the primal problem:

\[g(\alpha, \beta) \le p^*, \forall \alpha, \beta\]

2. Strong and Weak Duality

Let $p^*$ be the primal optimal, and $d^*$ be the dual optimal. According to the Lower Bound Property, we have

\[p^* \ge d^*\]

Which we also called it weak duality.

A problem has strong duality when its primal optimal equals to its dual optimal, i.e. $p^* = d^*$.

  • In most cases, the strong duality does not hold.
  • For convex functions, the strong duality usualy holds.
  • For non-convex functions, the strong duality sometimes

Slater’s conditions.: For a convex optimization problem with the form:

\[\begin{align} \min_{x}\ \ &f(x) \\ s.t.\ \ & g_i(x) \le 0, i=1...m \\ & h_i(x) = 0, i=1...p \end{align}\]

Strong duality holds if $\exists x, g_i(x) \lt 0, h_i(x) =0$.

The above condition is also called the strictly feasible, i.e. there exist a feasible solution that satisfies $g_i(x) \lt 0 $ for the original constraint $g_i(x) \le 0$.

The Slater’s condition is only suitable for convex function.

3. Complementary Slackness

Assume that strong duality holds for $x^*, \alpha^*, \beta^*$, where $x^*$ is the primal optimal, $\alpha^*, \beta^*$ is the dual optimal. By definition we have:

\[\begin{align} f(x^*) &=g(\alpha^*, \beta^*) \\ &= \inf_x [f(x) + \sum^k_{i=1}\alpha_i^* g_i(x) + \sum^l_{i=1}\beta_i^* h_i(x)] \\ & \le f(x^*) + \sum^k_{i=1}\alpha_i^* g_i(x^*) + \sum^l_{i=1}\beta_i^* h_i(x^*) \\ & \le f(x^*) \end{align}\]

When strong duality holds, we need $\alpha_i^* g_i(x^*)=0$, i.e.

\[\begin{cases} \alpha_i^*=0, \ \ f_i(x^*)<0 \\ \alpha_i^*\gt0, \ \ f_i(x^*)=0 \end{cases}\]

The above condition is called Complementary Slackness

If $g_i(x^*)=0$, we say this constraint $g_i(x)$ is active. An intuitive explaination of the Complementary Slackness is that: if the solution is on the boundary imposed by the inequality, then we must use its KKT multiplier to influence the solution to $x$; if the ineuqality has no influence on the solution, then we represent this by zeroing out its KKT multiplier.

4. KKT Conditions

KKT(Karush–Kuhn–Tucker) condition describes a necessary condition for strong duality (not sufficient).

Assume that $f(x), g_i(x), h_i(x)$ are differentiable, but we do not assume them to be convex function. The KKT condition is:

\[\frac{\partial}{\partial x}\mathcal{L}(x^*, \alpha^*, \beta^*) = 0\\ g_i(x^*) \le 0, i=1...m \\ h_i(x^*) = 0, i=1...p \\ \alpha_i^*\ge0, i=1...m \\ \alpha_i^*g_i(x)=0, i=1...m \\\]

Which can be summarized as:

  1. The gradient of the generalized Lagrangian is 0. (Because strong duality implies that the minumum value of $\mathcal{L}(x^*, \alpha^*, \beta^*)$ is at $x^*$)
  2. All constraints on both $x$ and the KKT multiplier are satisfied. (e.q. 2 and 3 in the above condition).
  3. The inequality constraints exhibit “complementary slackness”. (e.q. 4 and 5 in the above condition).

5. KKT condition for convex problem

Assume that the original problem is convex ($f$ and $g_i$ are convex, $h_i$ are affine functions), KKT condition becomes sufficient and necessary, i.e. the solutions satisfying KKT condition are also the optimal solution of the primal and dual.

6. Conclusion

  1. Dual problem is the infimum of the primal’s optimal solution.
  2. Weak Duality always exists.
  3. Strong duality needs satisfy KKT condition. (necessary condition)
  4. For convex problem, KKT condition is sufficient for strong duality.