Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/conditional-expectation0%dark
  1. home/
  2. writing/
  3. Conditional Expectation

20 May 2026 · 5 min read · updated 13 June 2026

Conditional Expectation

Conditioning on a sub-sigma-algebra produces the best estimate of a random variable given coarse information. We define conditional expectation by its defining averaging identity, prove existence and almost-sure uniqueness from the Radon-Nikodym theorem, identify it with the L^2 orthogonal projection, and establish linearity, the tower property, the pull-out rule, and the conditional Jensen inequality.

  • 3 equations
  • 9 results
  • 11 connections
  • probability
  • measure-theory
On this page▾
  • The defining property
  • Existence and uniqueness
  • The projection viewpoint
  • The calculus of conditioning

5 min left

  • The defining property1m
  • Existence and uniqueness1m
  • The projection viewpoint1m
  • The calculus of conditioning3m

Information is modelled by a sub-σ\sigmaσ-algebra, and conditioning identifies the best estimate of a random variable given only the events one can resolve. It is forced by an averaging identity, exists by Radon-Nikodym, and coincides with the L2L^2L2 projection.

#The defining property

Fix a probability space (Ω,F,P)(\Omega,\F,\P)(Ω,F,P) and a sub-σ\sigmaσ-algebra G⊆F\G\subseteq\FG⊆F. Let X∈L1(P)X\in L^1(\P)X∈L1(P).

Definition1

A conditional expectation of XXX given G\GG is a random variable YYY that is G\GG-measurable, lies in L1L^1L1, and satisfies

∫GY dP=∫GX dPfor every G∈G.(1)\int_G Y\dd\P=\int_G X\dd\P\qquad\text{for every } G\in\G. \tag{1}∫G​YdP=∫G​XdPfor every G∈G.(1)

Any such YYY is written E[X∣G]\E[X\mid\G]E[X∣G].

The two demands pull in opposite directions. The requirement of G\GG-measurability coarsens YYY to the resolution of G\GG, while Equation (1) forces it to reproduce the averages of XXX over every G\GG-event. We show below that exactly one random variable, up to a null set, meets both.

#Existence and uniqueness

Theorem2

For every X∈L1(P)X\in L^1(\P)X∈L1(P) a conditional expectation E[X∣G]\E[X\mid\G]E[X∣G] exists and is unique up to P\PP-almost-everywhere equality.

Proof

Take first X≥0X\ge 0X≥0. Define a measure on (Ω,G)(\Omega,\G)(Ω,G) by ν(G)=∫GX dP\nu(G)=\int_G X\dd\Pν(G)=∫G​XdP. It is finite, with ν(Ω)=E[X]<∞\nu(\Omega)=\E[X]<\inftyν(Ω)=E[X]<∞, and absolutely continuous with respect to the restriction P ⁣↾G\P\!\restriction_\GP↾G​, since G∈GG\in\GG∈G with P(G)=0\P(G)=0P(G)=0 forces ∫GX dP=0\int_G X\dd\P=0∫G​XdP=0. The Radon-Nikodym theorem applied on (Ω,G,P ⁣↾G)(\Omega,\G,\P\!\restriction_\G)(Ω,G,P↾G​) supplies a G\GG-measurable Y≥0Y\ge 0Y≥0 with ν(G)=∫GY dP\nu(G)=\int_G Y\dd\Pν(G)=∫G​YdP for all G∈GG\in\GG∈G, which is exactly Equation (1). For general XXX set E[X∣G]=E[X+∣G]−E[X−∣G]\E[X\mid\G]=\E[X^+\mid\G]-\E[X^-\mid\G]E[X∣G]=E[X+∣G]−E[X−∣G], both terms finite since X±≤∣X∣∈L1X^\pm\le\abs X\in L^1X±≤∣X∣∈L1. For uniqueness, if Y1,Y2Y_1,Y_2Y1​,Y2​ both satisfy Equation (1), the G\GG-set G={Y1>Y2}G=\{Y_1>Y_2\}G={Y1​>Y2​} gives ∫G(Y1−Y2) dP=0\int_G(Y_1-Y_2)\dd\P=0∫G​(Y1​−Y2​)dP=0 with a nonnegative integrand, so P(G)=0\P(G)=0P(G)=0, and symmetrically, hence Y1=Y2Y_1=Y_2Y1​=Y2​ almost surely.

#The projection viewpoint

Proposition3

For X∈L2(P)X\in L^2(\P)X∈L2(P) the conditional expectation E[X∣G]\E[X\mid\G]E[X∣G] is the orthogonal projection of XXX onto the closed subspace L2(Ω,G,P)L^2(\Omega,\G,\P)L2(Ω,G,P); equivalently it is the G\GG-measurable minimizer of E[(X−Y)2]\E[(X-Y)^2]E[(X−Y)2].

Proof

The space L2(Ω,G,P)L^2(\Omega,\G,\P)L2(Ω,G,P) is complete, being L2L^2L2 of the measure space (Ω,G,P ⁣↾G)(\Omega,\G,\P\!\restriction_\G)(Ω,G,P↾G​), hence a closed subspace of L2(P)L^2(\P)L2(P) (when G\GG is not P\PP-complete one works with the completion Gˉ\bar\GGˉ​, and L2(Gˉ)=L2(G)L^2(\bar\G)=L^2(\G)L2(Gˉ​)=L2(G) inside L2(P)L^2(\P)L2(P) since each Gˉ\bar\GGˉ​-measurable function agrees a.s. with a G\GG-measurable one). So the projection YYY exists and is characterized by X−Y⊥L2(G)X-Y\perp L^2(\G)X−Y⊥L2(G), that is E[(X−Y)Z]=0\E[(X-Y)Z]=0E[(X−Y)Z]=0 for every Z∈L2(G)Z\in L^2(\G)Z∈L2(G). In particular for Z=1GZ=\ind_GZ=1G​, bounded and hence in L2L^2L2 on a finite measure space, this recovers Equation (1), so Y=E[X∣G]Y=\E[X\mid\G]Y=E[X∣G] by Theorem 2. The minimization statement follows because an orthogonal projection minimizes the distance from XXX to the subspace.

#The calculus of conditioning

Proposition4

Let X,X′∈L1(P)X,X'\in L^1(\P)X,X′∈L1(P) and let H⊆G\mathcal H\subseteq\GH⊆G be a further sub-σ\sigmaσ-algebra. Then almost surely

(i)E[aX+bX′∣G]=a E[X∣G]+b E[X′∣G],(ii)E[E[X∣G] ∣ H]=E[X∣H],(iii)E[ZX∣G]=Z E[X∣G]for bounded G-measurable Z,(iv)X≥0 a.s. ⇒ E[X∣G]≥0 a.s.(2)\begin{aligned} &\text{(i)} &&\E[aX+bX'\mid\G]=a\,\E[X\mid\G]+b\,\E[X'\mid\G],\\[2pt] &\text{(ii)} &&\E\big[\E[X\mid\G]\,\big|\,\mathcal H\big]=\E[X\mid\mathcal H],\\[2pt] &\text{(iii)} &&\E[ZX\mid\G]=Z\,\E[X\mid\G]\quad\text{for bounded }\G\text{-measurable }Z,\\[2pt] &\text{(iv)} &&X\ge 0\ \text{a.s.}\ \Rightarrow\ \E[X\mid\G]\ge 0\ \text{a.s.} \end{aligned} \tag{2}​(i)(ii)(iii)(iv)​​E[aX+bX′∣G]=aE[X∣G]+bE[X′∣G],E[E[X∣G]​H]=E[X∣H],E[ZX∣G]=ZE[X∣G]for bounded G-measurable Z,X≥0 a.s. ⇒ E[X∣G]≥0 a.s.​(2)
Proof

Statement (i) holds because the right side is G\GG-measurable and integrates correctly over each G∈GG\in\GG∈G by linearity of the integral, so Theorem 2 identifies it. For (ii), the inner variable W=E[X∣G]W=\E[X\mid\G]W=E[X∣G] lies in L1L^1L1, so E[W∣H]\E[W\mid\mathcal H]E[W∣H] is defined. The right side E[X∣H]\E[X\mid\mathcal H]E[X∣H] is H\mathcal HH-measurable, and for H∈H⊆GH\in\mathcal H\subseteq\GH∈H⊆G, since H∈GH\in\GH∈G as well, ∫HE[X∣H] dP=∫HX dP=∫HW dP\int_H\E[X\mid\mathcal H]\dd\P=\int_H X\dd\P=\int_H W\dd\P∫H​E[X∣H]dP=∫H​XdP=∫H​WdP by Equation (1) applied at each level. By uniqueness Theorem 2 identifies E[X∣H]\E[X\mid\mathcal H]E[X∣H] with E[W∣H]\E[W\mid\mathcal H]E[W∣H]. For (iii), the claim holds for Z=1G0Z=\ind_{G_0}Z=1G0​​ with G0∈GG_0\in\GG0​∈G because for G∈GG\in\GG∈G,

∫G1G0E[X∣G] dP=∫G∩G0E[X∣G] dP=∫G∩G0X dP=∫G1G0X dP,(3)\int_G \ind_{G_0}\E[X\mid\G]\dd\P=\int_{G\cap G_0}\E[X\mid\G]\dd\P=\int_{G\cap G_0}X\dd\P=\int_G\ind_{G_0}X\dd\P, \tag{3}∫G​1G0​​E[X∣G]dP=∫G∩G0​​E[X∣G]dP=∫G∩G0​​XdP=∫G​1G0​​XdP,(3)

and it extends to general bounded G\GG-measurable ZZZ by dominated convergence. In detail, with ∣Z∣≤M\abs Z\le M∣Z∣≤M pick simple G\GG-measurable ZnZ_nZn​ with ∣Zn∣≤M\abs{Z_n}\le M∣Zn​∣≤M and Zn→ZZ_n\to ZZn​→Z pointwise; then ∣ZnX∣≤M∣X∣∈L1\abs{Z_nX}\le M\abs X\in L^1∣Zn​X∣≤M∣X∣∈L1 and ZnX→ZXZ_nX\to ZXZn​X→ZX almost surely, so the ordinary dominated convergence theorem gives ∥ZnX−ZX∥1→0\|Z_nX-ZX\|_1\to0∥Zn​X−ZX∥1​→0, and since conditional expectation is an L1L^1L1-contraction we get ∥E[ZnX∣G]−E[ZX∣G]∥1≤∥ZnX−ZX∥1→0\|\E[Z_nX\mid\G]-\E[ZX\mid\G]\|_1\le\|Z_nX-ZX\|_1\to0∥E[Zn​X∣G]−E[ZX∣G]∥1​≤∥Zn​X−ZX∥1​→0. Hence E[ZnX∣G]→E[ZX∣G]\E[Z_nX\mid\G]\to\E[ZX\mid\G]E[Zn​X∣G]→E[ZX∣G] in L1L^1L1, so along a subsequence almost surely, while ZnE[X∣G]→ZE[X∣G]Z_n\E[X\mid\G]\to Z\E[X\mid\G]Zn​E[X∣G]→ZE[X∣G] almost surely; equating limits gives the claim. For (iv), if X≥0X\ge0X≥0 a.s. set G={E[X∣G]<0}∈GG=\{\E[X\mid\G]<0\}\in\GG={E[X∣G]<0}∈G; then ∫GE[X∣G] dP=∫GX dP≥0\int_G\E[X\mid\G]\dd\P=\int_G X\dd\P\ge0∫G​E[X∣G]dP=∫G​XdP≥0 while the integrand is negative on GGG, forcing P(G)=0\P(G)=0P(G)=0.

Theorem5

(Conditional Jensen.) If φ:R→R\varphi:\R\to\Rφ:R→R is convex and X,φ(X)∈L1(P)X,\varphi(X)\in L^1(\P)X,φ(X)∈L1(P), then almost surely φ ⁣(E[X∣G])≤E[φ(X)∣G]\varphi\!\big(\E[X\mid\G]\big)\le\E[\varphi(X)\mid\G]φ(E[X∣G])≤E[φ(X)∣G].

Proof

At each rational qqq a convex φ\varphiφ has a subgradient sqs_qsq​, giving the affine minorant ℓq(x)=φ(q)+sq(x−q)≤φ(x)\ell_q(x)=\varphi(q)+s_q(x-q)\le\varphi(x)ℓq​(x)=φ(q)+sq​(x−q)≤φ(x); continuity of φ\varphiφ and density of Q\QQ then yield sup⁡qℓq(x)=φ(x)\sup_q\ell_q(x)=\varphi(x)supq​ℓq​(x)=φ(x) for every x∈Rx\in\Rx∈R, a supremum over a countable family. For each qqq, monotonicity (iv) and linearity (i) from Proposition 4, together with E[b∣G]=b\E[b\mid\G]=bE[b∣G]=b for the constant bbb (immediate from Theorem 2, since a constant is G\GG-measurable and reproduces its own averages), give E[φ(X)∣G]≥E[ℓq(X)∣G]=sqE[X∣G]+(φ(q)−sqq)=ℓq ⁣(E[X∣G])\E[\varphi(X)\mid\G]\ge\E[\ell_q(X)\mid\G]=s_q\E[X\mid\G]+(\varphi(q)-s_q q)=\ell_q\!\big(\E[X\mid\G]\big)E[φ(X)∣G]≥E[ℓq​(X)∣G]=sq​E[X∣G]+(φ(q)−sq​q)=ℓq​(E[X∣G]) almost surely. Taking the supremum over the countable family, a null set at a time, yields E[φ(X)∣G]≥sup⁡qℓq ⁣(E[X∣G])=φ ⁣(E[X∣G])\E[\varphi(X)\mid\G]\ge\sup_q\ell_q\!\big(\E[X\mid\G]\big)=\varphi\!\big(\E[X\mid\G]\big)E[φ(X)∣G]≥supq​ℓq​(E[X∣G])=φ(E[X∣G]) [1].

Conditional expectation is therefore both a density, by Theorem 2, and a projection, by Proposition 3, and the averaging identity Equation (1) is the common root of every rule above.

[1]
D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Part 6 of 9 in Probability

← previousConvergence and Limit Theoremsnext →Second-Order Processes and Mean-Square Calculus

Explore connections

see in the atlas →

related

  • Projection and Riesz Representation
  • Statistical Arbitrage
  • Probability Spaces and Random Variables

referenced by (4)

  • Probability Spaces and Random Variables
  • Product Measures and Fubini's Theorem
  • Projection and Riesz Representation
  • Sigma-Algebras and Measures
cite
@misc{conditional-expectation,
  author = {Zac Kienzle},
  title  = {Conditional Expectation},
  year   = {2026},
  month  = {05},
  url    = {https://zackienzle.com/blog/conditional-expectation}
}