Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/convex-optimization-and-kkt0%dark
  1. home/
  2. writing/
  3. Convex Duality and the KKT Conditions

08 June 2026 · 6 min read · updated 13 June 2026

Convex Duality and the KKT Conditions

A constrained optimisation problem is solved when a candidate point is paired with a certificate that no better point exists, and convex duality manufactures the certificate. We define the Lagrangian and its dual function, prove weak duality, so the dual bounds the primal, prove strong duality under Slater's condition through the supporting hyperplane theorem, and derive the Karush-Kuhn-Tucker conditions that are necessary and sufficient for optimality in a convex program. These conditions solve the quadratic programs of portfolio choice and optimal execution in closed form.

  • 6 equations
  • 7 results
  • 3 connections
  • optimization
  • convex-analysis
  • duality
On this page▾
  • The Lagrangian and weak duality
  • Strong duality
  • The Karush-Kuhn-Tucker conditions

6 min left

  • The Lagrangian and weak duality1m
  • Strong duality2m
  • The Karush-Kuhn-Tucker conditions2m

A minimisation problem is only half solved by a candidate point, since the candidate is worthless without a proof that nothing does better. Convex duality builds that proof, pairing the original problem with a dual whose every value is a lower bound on the primal optimum, and under a mild condition the best lower bound matches the optimum exactly, certifying it. The mechanism is the supporting hyperplane of the previous post, and the certificate it produces is the system of Karush-Kuhn-Tucker conditions. This post proves duality and derives those conditions [1], [2]. The problem is the convex program

min⁡xf(x)subject to gi(x)≤0, i=1,…,m,(1)\min_x f(x)\quad\text{subject to } g_i(x)\le 0,\ i=1,\dots,m, \tag{1}xmin​f(x)subject to gi​(x)≤0, i=1,…,m,(1)

with fff and the gig_igi​ convex and differentiable on Rn\R^nRn, optimal value p∗p^\astp∗.

#The Lagrangian and weak duality

Definition1

The Lagrangian of Equation (1) is L(x,λ)=f(x)+∑i=1mλigi(x)L(x,\lambda)=f(x)+\sum_{i=1}^m\lambda_i g_i(x)L(x,λ)=f(x)+∑i=1m​λi​gi​(x) for multipliers λ≥0\lambda\ge 0λ≥0, and the dual function is d(λ)=inf⁡xL(x,λ)d(\lambda)=\inf_x L(x,\lambda)d(λ)=infx​L(x,λ).

The dual function is concave, being a pointwise infimum of affine functions of λ\lambdaλ, hence concave regardless of any convexity of fff.

Theorem2

For every λ≥0\lambda\ge 0λ≥0, d(λ)≤p∗d(\lambda)\le p^\astd(λ)≤p∗.

Proof

Let x0x_0x0​ be any feasible point, so gi(x0)≤0g_i(x_0)\le 0gi​(x0​)≤0. Since λ≥0\lambda\ge 0λ≥0, the penalty ∑iλigi(x0)≤0\sum_i\lambda_i g_i(x_0) \le 0∑i​λi​gi​(x0​)≤0, so L(x0,λ)=f(x0)+∑iλigi(x0)≤f(x0)L(x_0,\lambda)=f(x_0)+\sum_i\lambda_i g_i(x_0)\le f(x_0)L(x0​,λ)=f(x0​)+∑i​λi​gi​(x0​)≤f(x0​). The infimum over xxx is at most the value at the particular x0x_0x0​, so d(λ)=inf⁡xL(x,λ)≤L(x0,λ)≤f(x0)d(\lambda)=\inf_{x}L(x,\lambda)\le L(x_0,\lambda)\le f(x_0)d(λ)=infx​L(x,λ)≤L(x0​,λ)≤f(x0​). Minimising the right side over feasible x0x_0x0​ gives d(λ)≤p∗d(\lambda)\le p^\astd(λ)≤p∗.

The best dual bound is the dual optimum d∗=sup⁡λ≥0d(λ)≤p∗d^\ast=\sup_{\lambda\ge 0}d(\lambda)\le p^\astd∗=supλ≥0​d(λ)≤p∗, and the gap p∗−d∗p^\ast-d^\astp∗−d∗ is the duality gap. For convex programs satisfying a regularity condition the gap is zero.

#Strong duality

Theorem3

If the convex program Equation (1) has a strictly feasible point, one with gi(x)<0g_i(x)<0gi​(x)<0 for all iii, and p∗p^\astp∗ is finite, then d∗=p∗d^\ast=p^\astd∗=p∗ and the dual optimum is attained.

Proof

Consider the set of constraint-and-objective values that can be dominated,

A={(u,t)∈Rm×R: ∃ x with gi(x)≤ui for all i and f(x)≤t}.(2)A=\big\{(u,t)\in\R^m\times\R:\ \exists\,x\ \text{with } g_i(x)\le u_i\ \text{for all }i\ \text{and } f(x) \le t\big\}. \tag{2}A={(u,t)∈Rm×R: ∃x with gi​(x)≤ui​ for all i and f(x)≤t}.(2)

The set AAA is convex, because if (u,t)(u,t)(u,t) is witnessed by xxx and (u′,t′)(u',t')(u′,t′) by x′x'x′, then convexity of fff and the gig_igi​ makes any convex combination θ(u,t)+(1−θ)(u′,t′)\theta(u,t)+(1-\theta)(u',t')θ(u,t)+(1−θ)(u′,t′) witnessed by the same combination θx+(1−θ)x′\theta x+(1-\theta)x'θx+(1−θ)x′, and AAA is closed under increasing uuu and ttt, because a witness xxx for (u,t)(u,t)(u,t) with gi(x)≤ui≤ui′g_i(x)\le u_i\le u'_igi​(x)≤ui​≤ui′​ and f(x)≤t≤t′f(x)\le t\le t'f(x)≤t≤t′ also witnesses any (u′,t′)≥(u,t)(u',t')\ge(u,t)(u′,t′)≥(u,t). It has nonempty interior, since for any x0x_0x0​ the open set {(u,t):ui>gi(x0) for all i and t>f(x0)}\{(u,t):u_i>g_i(x_0)\text{ for all }i\text{ and }t>f(x_0)\}{(u,t):ui​>gi​(x0​) for all i and t>f(x0​)} is witnessed by x0x_0x0​ and so contained in AAA. The point (0,p∗)(0,p^\ast)(0,p∗) lies on the boundary of AAA, since it is in the closure but no point (0,t)(0,t)(0,t) with t<p∗t<p^\astt<p∗ is in AAA, that being the definition of p∗p^\astp∗. Since AAA is convex with nonempty interior and (0,p∗)∉int⁡A(0,p^\ast)\notin\operatorname{int}A(0,p∗)∈/intA, the nonempty-interior version of the supporting hyperplane theorem gives a nonzero (λ,μ)(\lambda,\mu)(λ,μ) supporting cl⁡A\operatorname{cl}AclA at that point, and since A⊆cl⁡AA\subseteq\operatorname{cl}AA⊆clA the support inequality holds on AAA itself,

⟨λ,u⟩+μt≥μ p∗for all (u,t)∈A.(3)\ip\lambda u+\mu t\ge\mu\,p^\ast\qquad\text{for all }(u,t)\in A. \tag{3}⟨λ,u⟩+μt≥μp∗for all (u,t)∈A.(3)

Letting ui→∞u_i\to\inftyui​→∞ or t→∞t\to\inftyt→∞ forces λ≥0\lambda\ge 0λ≥0 and μ≥0\mu\ge 0μ≥0, since otherwise the left side is unbounded below. The multiplier μ\muμ is strictly positive, for if μ=0\mu=0μ=0 then ⟨λ,u⟩≥0\ip\lambda u\ge 0⟨λ,u⟩≥0 on AAA, and the strictly feasible xˉ\bar xxˉ gives a point of AAA with ui=gi(xˉ)<0u_i=g_i(\bar x)<0ui​=gi​(xˉ)<0, so ⟨λ,g(xˉ)⟩≥0\ip\lambda{g( \bar x)}\ge 0⟨λ,g(xˉ)⟩≥0 with λ≥0\lambda\ge 0λ≥0 forces λ=0\lambda=0λ=0, contradicting (λ,μ)≠0(\lambda,\mu)\neq 0(λ,μ)=0. Dividing Equation (3) by μ>0\mu>0μ>0 and writing λ\lambdaλ for λ/μ\lambda/\muλ/μ, every xxx contributes the point (g(x),f(x))∈A(g(x),f(x))\in A(g(x),f(x))∈A, so f(x)+⟨λ,g(x)⟩≥p∗f(x)+\ip\lambda{g(x)}\ge p^\astf(x)+⟨λ,g(x)⟩≥p∗. Taking the infimum over xxx gives d(λ)≥p∗d(\lambda)\ge p^\astd(λ)≥p∗, and with weak duality d(λ)≤p∗d(\lambda)\le p^\astd(λ)≤p∗ this is d(λ)=p∗=d∗d(\lambda)=p^\ast=d^\astd(λ)=p∗=d∗, attained.

Strong duality makes the dual optimum an exact certificate, a multiplier whose dual value equals p∗p^\astp∗. The certificate is encoded in a system of equations and inequalities at the optimal point.

#The Karush-Kuhn-Tucker conditions

Theorem4

Suppose Equation (1) is convex with a strictly feasible point. A feasible x∗x^\astx∗ is optimal if and only if there is λ∗≥0\lambda^\ast\ge 0λ∗≥0 with

∇f(x∗)+∑i=1mλi∗∇gi(x∗)=0,λi∗gi(x∗)=0  for all i,(4)\nabla f(x^\ast)+\sum_{i=1}^m\lambda_i^\ast\nabla g_i(x^\ast)=0,\qquad\lambda_i^\ast g_i(x^\ast)=0\ \text{ for all }i, \tag{4}∇f(x∗)+i=1∑m​λi∗​∇gi​(x∗)=0,λi∗​gi​(x∗)=0  for all i,(4)

the stationarity and complementary slackness conditions.

Proof

Suppose x∗x^\astx∗ is optimal. By strong duality there is a dual optimal λ∗≥0\lambda^\ast\ge 0λ∗≥0 with d(λ∗)=p∗=f(x∗)d(\lambda^ \ast)=p^\ast=f(x^\ast)d(λ∗)=p∗=f(x∗). Then

f(x∗)=d(λ∗)=inf⁡xL(x,λ∗)≤L(x∗,λ∗)=f(x∗)+∑iλi∗gi(x∗)≤f(x∗),(5)f(x^\ast)=d(\lambda^\ast)=\inf_x L(x,\lambda^\ast)\le L(x^\ast,\lambda^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)\le f(x^\ast), \tag{5}f(x∗)=d(λ∗)=xinf​L(x,λ∗)≤L(x∗,λ∗)=f(x∗)+i∑​λi∗​gi​(x∗)≤f(x∗),(5)

the last inequality because λi∗≥0\lambda_i^\ast\ge 0λi∗​≥0 and gi(x∗)≤0g_i(x^\ast)\le 0gi​(x∗)≤0. The chain is therefore an equality throughout. Equality on the right forces ∑iλi∗gi(x∗)=0\sum_i\lambda_i^\ast g_i(x^\ast)=0∑i​λi∗​gi​(x∗)=0, a sum of nonpositive terms, so each λi∗gi(x∗)=0\lambda_i^\ast g_i(x^\ast)=0λi∗​gi​(x∗)=0, complementary slackness. Equality on the left says x∗x^\astx∗ minimises the convex function x↦L(x,λ∗)x\mapsto L(x,\lambda^\ast)x↦L(x,λ∗), whose gradient there must vanish, stationarity. Conversely, suppose x∗x^\astx∗ is feasible and satisfies Equation (4). Stationarity makes x∗x^\astx∗ a minimiser of the convex L(⋅,λ∗)L(\cdot,\lambda^\ast)L(⋅,λ∗), since a convex function with zero gradient is at its global minimum, so for any feasible yyy,

f(x∗)=f(x∗)+∑iλi∗gi(x∗)=L(x∗,λ∗)≤L(y,λ∗)=f(y)+∑iλi∗gi(y)≤f(y),(6)f(x^\ast)=f(x^\ast)+\sum_i\lambda_i^\ast g_i(x^\ast)=L(x^\ast,\lambda^\ast)\le L(y,\lambda^\ast)=f(y)+\sum_i \lambda_i^\ast g_i(y)\le f(y), \tag{6}f(x∗)=f(x∗)+i∑​λi∗​gi​(x∗)=L(x∗,λ∗)≤L(y,λ∗)=f(y)+i∑​λi∗​gi​(y)≤f(y),(6)

using complementary slackness on the left and λi∗≥0\lambda_i^\ast\ge 0λi∗​≥0, gi(y)≤0g_i(y)\le 0gi​(y)≤0 on the right. So x∗x^\astx∗ is optimal.

The Karush-Kuhn-Tucker conditions are the computational form of optimality, a square system pairing each constraint with a multiplier and demanding stationarity of the Lagrangian, complementary slackness, and feasibility. For a convex program they are exactly equivalent to optimality, so solving them solves the problem. The mean-variance portfolio minimises a quadratic risk subject to a linear return target, and its Karush-Kuhn-Tucker system is linear and solvable in closed form, the multiplier becoming the price of risk. The Almgren-Chriss execution problem minimises a quadratic cost of trading against a schedule, and its system is solvable the same way, the multiplier becoming the urgency of execution.

[1]
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[2]
R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970.

Part 2 of 2 in Optimization

← previousConvex Sets and Functions

Explore connections

see in the atlas →

related

  • The Mean-Variance Portfolio
  • Price Formation in the Order Book
  • Convex Sets and Functions

referenced by (2)

  • Convex Sets and Functions
  • The Mean-Variance Portfolio
cite
@misc{convex-optimization-and-kkt,
  author = {Zac Kienzle},
  title  = {Convex Duality and the KKT Conditions},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/convex-optimization-and-kkt}
}