## Machine Learning Note: Duality

17 November 2019

### 1. Lower bound property

Given an optimization problem with the following standard form:

\[\begin{align} \min_{x}\ \ &f(x) \\ s.t.\ \ & g_i(x) \le 0, i=1...m \\ & h_i(x) = 0, i=1...p \end{align}\]Based on the **Generalized Lagrangian** (original Lagrangian only allows equality constraint), the original optimization problem with constraints can be convert to the following optimization problem without constraints:

This Lagrangian can be solved by (if there is a feasible solution and it cannot be $\infty$):

\[\min_{w} \max_{\alpha, \beta; \alpha \ge 0} \mathcal{L}(w, \alpha, \beta)\]Its **Lagrangian Dual Function** is:

where $\alpha, \beta$ is called dual variables.

**Lower Bound Property**. The dual problem is always the lower bound of the primal problem:

### 2. Strong and Weak Duality

Let $p^*$ be the primal optimal, and $d^*$ be the dual optimal. According to the Lower Bound Property, we have

\[p^* \ge d^*\]Which we also called it **weak duality**.

A problem has **strong duality** when its primal optimal equals to its dual optimal, i.e. $p^* = d^*$.

- In most cases, the strong duality does not hold.
- For convex functions, the strong duality usualy holds.
- For non-convex functions, the strong duality sometimes

**Slater’s conditions.**: For a convex optimization problem with the form:

Strong duality holds if $\exists x, g_i(x) \lt 0, h_i(x) =0$.

The above condition is also called the **strictly feasible**, i.e. there exist a feasible solution that satisfies $g_i(x) \lt 0 $ for the original constraint $g_i(x) \le 0$.

The Slater’s condition is only suitable for convex function.

### 3. Complementary Slackness

Assume that strong duality holds for $x^*, \alpha^*, \beta^*$, where $x^*$ is the primal optimal, $\alpha^*, \beta^*$ is the dual optimal. By definition we have:

\[\begin{align} f(x^*) &=g(\alpha^*, \beta^*) \\ &= \inf_x [f(x) + \sum^k_{i=1}\alpha_i^* g_i(x) + \sum^l_{i=1}\beta_i^* h_i(x)] \\ & \le f(x^*) + \sum^k_{i=1}\alpha_i^* g_i(x^*) + \sum^l_{i=1}\beta_i^* h_i(x^*) \\ & \le f(x^*) \end{align}\]When strong duality holds, we need $\alpha_i^* g_i(x^*)=0$, i.e.

\[\begin{cases} \alpha_i^*=0, \ \ f_i(x^*)<0 \\ \alpha_i^*\gt0, \ \ f_i(x^*)=0 \end{cases}\]The above condition is called **Complementary Slackness**

If $g_i(x^*)=0$, we say this constraint $g_i(x)$ is active. An intuitive explaination of the Complementary Slackness is that: if the solution is on the boundary imposed by the inequality, then we must use its KKT multiplier to influence the solution to $x$; if the ineuqality has no influence on the solution, then we represent this by zeroing out its KKT multiplier.

### 4. KKT Conditions

**KKT(Karush–Kuhn–Tucker)** condition describes a necessary condition for strong duality (not sufficient).

Assume that $f(x), g_i(x), h_i(x)$ are differentiable, but we do not assume them to be convex function. The KKT condition is:

\[\frac{\partial}{\partial x}\mathcal{L}(x^*, \alpha^*, \beta^*) = 0\\ g_i(x^*) \le 0, i=1...m \\ h_i(x^*) = 0, i=1...p \\ \alpha_i^*\ge0, i=1...m \\ \alpha_i^*g_i(x)=0, i=1...m \\\]Which can be summarized as:

- The gradient of the generalized Lagrangian is 0. (Because strong duality implies that the minumum value of $\mathcal{L}(x^*, \alpha^*, \beta^*)$ is at $x^*$)
- All constraints on both $x$ and the KKT multiplier are satisfied. (e.q. 2 and 3 in the above condition).
- The inequality constraints exhibit “complementary slackness”. (e.q. 4 and 5 in the above condition).

### 5. KKT condition for convex problem

Assume that the original problem is convex ($f$ and $g_i$ are convex, $h_i$ are affine functions), KKT condition becomes sufficient and necessary, i.e. the solutions satisfying KKT condition are also the optimal solution of the primal and dual.

### 6. Conclusion

- Dual problem is the infimum of the primal’s optimal solution.
- Weak Duality always exists.
- Strong duality needs satisfy KKT condition. (necessary condition)
- For convex problem, KKT condition is sufficient for strong duality.