Convex Duality and the KKT Conditions

A minimisation problem is only half solved by a candidate point, since the candidate is worthless without a proof that nothing does better. Convex duality builds that proof, pairing the original problem with a dual whose every value is a lower bound on the primal optimum, and under a mild condition the best lower bound matches the optimum exactly, certifying it. The mechanism is the supporting hyperplane of the previous post, and the certificate it produces is the system of Karush-Kuhn-Tucker conditions. This post proves duality and derives those conditions [1], [2]. The problem is the convex program

\min_x f(x)\quad\text{subject to } g_i(x)\le 0,\ i=1,\dots,m, \tag{1}

with $f$ and the $g_i$ convex and differentiable on $\R^n$ , optimal value $p^\ast$ .

#The Lagrangian and weak duality

Definition1

The Lagrangian of Equation (1) is $L(x,\lambda)=f(x)+\sum_{i=1}^m\lambda_i g_i(x)$ for multipliers $\lambda\ge 0$ , and the dual function is $d(\lambda)=\inf_x L(x,\lambda)$ .

The dual function is concave, being a pointwise infimum of affine functions of $\lambda$ , hence concave regardless of any convexity of $f$ .

Theorem2

For every $\lambda\ge 0$ , $d(\lambda)\le p^\ast$ .

Proof

Let $x_0$ be any feasible point, so $g_i(x_0)\le 0$ . Since $\lambda\ge 0$ , the penalty $\sum_i\lambda_i g_i(x_0) \le 0$ , so $L(x_0,\lambda)=f(x_0)+\sum_i\lambda_i g_i(x_0)\le f(x_0)$ . The infimum over $x$ is at most the value at the particular $x_0$ , so $d(\lambda)=\inf_{x}L(x,\lambda)\le L(x_0,\lambda)\le f(x_0)$ . Minimising the right side over feasible $x_0$ gives $d(\lambda)\le p^\ast$ .

The best dual bound is the dual optimum $d^\ast=\sup_{\lambda\ge 0}d(\lambda)\le p^\ast$ , and the gap $p^\ast-d^\ast$ is the duality gap. For convex programs satisfying a regularity condition the gap is zero.

#Strong duality

Theorem3

If the convex program Equation (1) has a strictly feasible point, one with $g_i(x)<0$ for all $i$ , and $p^\ast$ is finite, then $d^\ast=p^\ast$ and the dual optimum is attained.

Proof

Consider the set of constraint-and-objective values that can be dominated,

A=\big\{(u,t)\in\R^m\times\R:\ \exists\,x\ \text{with } g_i(x)\le u_i\ \text{for all }i\ \text{and } f(x) \le t\big\}. \tag{2}

The set $A$ is convex, because if $(u,t)$ is witnessed by $x$ and $(u',t')$ by $x'$ , then convexity of $f$ and the $g_i$ makes any convex combination $\theta(u,t)+(1-\theta)(u',t')$ witnessed by the same combination $\theta x+(1-\theta)x'$ , and $A$ is closed under increasing $u$ and $t$ , because a witness $x$ for $(u,t)$ with $g_i(x)\le u_i\le u'_i$ and $f(x)\le t\le t'$ also witnesses any $(u',t')\ge(u,t)$ . It has nonempty interior, since for any $x_0$ the open set $\{(u,t):u_i>g_i(x_0)\text{ for all }i\text{ and }t>f(x_0)\}$ is witnessed by $x_0$ and so contained in $A$ . The point $(0,p^\ast)$ lies on the boundary of $A$ , since it is in the closure but no point $(0,t)$ with $t<p^\ast$ is in $A$ , that being the definition of $p^\ast$ . Since $A$ is convex with nonempty interior and $(0,p^\ast)\notin\operatorname{int}A$ , the nonempty-interior version of the supporting hyperplane theorem gives a nonzero $(\lambda,\mu)$ supporting $\operatorname{cl}A$ at that point, and since $A\subseteq\operatorname{cl}A$ the support inequality holds on $A$ itself,

\ip\lambda u+\mu t\ge\mu\,p^\ast\qquad\text{for all }(u,t)\in A. \tag{3}

Letting $u_i\to\infty$ or $t\to\infty$ forces $\lambda\ge 0$ and $\mu\ge 0$ , since otherwise the left side is unbounded below. The multiplier $\mu$ is strictly positive, for if $\mu=0$ then $\ip\lambda u\ge 0$ on $A$ , and the strictly feasible $\bar x$ gives a point of $A$ with $u_i=g_i(\bar x)<0$ , so $\ip\lambda{g( \bar x)}\ge 0$ with $\lambda\ge 0$ forces $\lambda=0$ , contradicting $(\lambda,\mu)\neq 0$ . Dividing Equation (3) by $\mu>0$ and writing $\lambda$ for $\lambda/\mu$ , every $x$ contributes the point $(g(x),f(x))\in A$ , so $f(x)+\ip\lambda{g(x)}\ge p^\ast$ . Taking the infimum over $x$ gives $d(\lambda)\ge p^\ast$ , and with weak duality $d(\lambda)\le p^\ast$ this is $d(\lambda)=p^\ast=d^\ast$ , attained.

Strong duality makes the dual optimum an exact certificate, a multiplier whose dual value equals $p^\ast$ . The certificate is encoded in a system of equations and inequalities at the optimal point.

#The Karush-Kuhn-Tucker conditions

Theorem4

Suppose Equation (1) is convex with a strictly feasible point. A feasible $x^\ast$ is optimal if and only if there is $\lambda^\ast\ge 0$ with

\nabla f(x^\ast)+\sum_{i=1}^m\lambda_i^\ast\nabla g_i(x^\ast)=0,\qquad\lambda_i^\ast g_i(x^\ast)=0\ \text{ for all }i, \tag{4}

the stationarity and complementary slackness conditions.

Proof

Suppose $x^\ast$ is optimal. By strong duality there is a dual optimal $\lambda^\ast\ge 0$ with $d(\lambda^ \ast)=p^\ast=f(x^\ast)$ . Then

f(x^\ast)=d(\lambda^\ast)=\inf_x L(x,\lambda^\ast)\le L(x^\ast,\lambda^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)\le f(x^\ast), \tag{5}

the last inequality because $\lambda_i^\ast\ge 0$ and $g_i(x^\ast)\le 0$ . The chain is therefore an equality throughout. Equality on the right forces $\sum_i\lambda_i^\ast g_i(x^\ast)=0$ , a sum of nonpositive terms, so each $\lambda_i^\ast g_i(x^\ast)=0$ , complementary slackness. Equality on the left says $x^\ast$ minimises the convex function $x\mapsto L(x,\lambda^\ast)$ , whose gradient there must vanish, stationarity. Conversely, suppose $x^\ast$ is feasible and satisfies Equation (4). Stationarity makes $x^\ast$ a minimiser of the convex $L(\cdot,\lambda^\ast)$ , since a convex function with zero gradient is at its global minimum, so for any feasible $y$ ,

f(x^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)=L(x^\ast,\lambda^\ast)\le L(y,\lambda^\ast)=f(y)+\sum_i \lambda_i^\ast g_i(y)\le f(y), \tag{6}

using complementary slackness on the left and $\lambda_i^\ast\ge 0$ , $g_i(y)\le 0$ on the right. So $x^\ast$ is optimal.

The Karush-Kuhn-Tucker conditions are the computational form of optimality, a square system pairing each constraint with a multiplier and demanding stationarity of the Lagrangian, complementary slackness, and feasibility. For a convex program they are exactly equivalent to optimality, so solving them solves the problem. The mean-variance portfolio minimises a quadratic risk subject to a linear return target, and its Karush-Kuhn-Tucker system is linear and solvable in closed form, the multiplier becoming the price of risk. The Almgren-Chriss execution problem minimises a quadratic cost of trading against a schedule, and its system is solvable the same way, the multiplier becoming the urgency of execution.

[1]

S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.

[2]

R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.

Explore connections

see in the atlas

referenced by (2)

cite

@misc{convex-optimization-and-kkt,
  author = {Zac Kienzle},
  title  = {Convex Duality and the KKT Conditions},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convex-optimization-and-kkt}
}

\min_x f(x)\quad\text{subject to } g_i(x)\le 0,\ i=1,\dots,m, \tag{1}

with $f$ and the $g_i$ convex and differentiable on $\R^n$ , optimal value $p^\ast$ .

#The Lagrangian and weak duality

Definition1

The Lagrangian of Equation (1) is $L(x,\lambda)=f(x)+\sum_{i=1}^m\lambda_i g_i(x)$ for multipliers $\lambda\ge 0$ , and the dual function is $d(\lambda)=\inf_x L(x,\lambda)$ .

The dual function is concave, being a pointwise infimum of affine functions of $\lambda$ , hence concave regardless of any convexity of $f$ .

Theorem2

For every $\lambda\ge 0$ , $d(\lambda)\le p^\ast$ .

Proof

#Strong duality

Theorem3

If the convex program Equation (1) has a strictly feasible point, one with $g_i(x)<0$ for all $i$ , and $p^\ast$ is finite, then $d^\ast=p^\ast$ and the dual optimum is attained.

Proof

Consider the set of constraint-and-objective values that can be dominated,

A=\big\{(u,t)\in\R^m\times\R:\ \exists\,x\ \text{with } g_i(x)\le u_i\ \text{for all }i\ \text{and } f(x) \le t\big\}. \tag{2}

\ip\lambda u+\mu t\ge\mu\,p^\ast\qquad\text{for all }(u,t)\in A. \tag{3}

Strong duality makes the dual optimum an exact certificate, a multiplier whose dual value equals $p^\ast$ . The certificate is encoded in a system of equations and inequalities at the optimal point.

#The Karush-Kuhn-Tucker conditions

Theorem4

Suppose Equation (1) is convex with a strictly feasible point. A feasible $x^\ast$ is optimal if and only if there is $\lambda^\ast\ge 0$ with

\nabla f(x^\ast)+\sum_{i=1}^m\lambda_i^\ast\nabla g_i(x^\ast)=0,\qquad\lambda_i^\ast g_i(x^\ast)=0\ \text{ for all }i, \tag{4}

the stationarity and complementary slackness conditions.

Proof

Suppose $x^\ast$ is optimal. By strong duality there is a dual optimal $\lambda^\ast\ge 0$ with $d(\lambda^ \ast)=p^\ast=f(x^\ast)$ . Then

f(x^\ast)=d(\lambda^\ast)=\inf_x L(x,\lambda^\ast)\le L(x^\ast,\lambda^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)\le f(x^\ast), \tag{5}

f(x^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)=L(x^\ast,\lambda^\ast)\le L(y,\lambda^\ast)=f(y)+\sum_i \lambda_i^\ast g_i(y)\le f(y), \tag{6}

using complementary slackness on the left and $\lambda_i^\ast\ge 0$ , $g_i(y)\le 0$ on the right. So $x^\ast$ is optimal.

[1]

S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.

[2]

R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.

Explore connections

see in the atlas

referenced by (2)

cite

@misc{convex-optimization-and-kkt,
  author = {Zac Kienzle},
  title  = {Convex Duality and the KKT Conditions},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convex-optimization-and-kkt}
}