Conditioning on a sub-sigma-algebra produces the best estimate of a random variable given coarse information. We define conditional expectation by its defining averaging identity, prove existence and almost-sure uniqueness from the Radon-Nikodym theorem, identify it with the L^2 orthogonal projection, and establish linearity, the tower property, the pull-out rule, and the conditional Jensen inequality.
Information is modelled by a sub-σ-algebra, and conditioning identifies the best
estimate of a random variable given only the events one can resolve. It is forced by an
averaging identity, exists by Radon-Nikodym, and coincides with the L2 projection.
Fix a probability space (Ω,F,P) and a sub-σ-algebra G⊆F. Let
X∈L1(P).
Definition1
A conditional expectation of X given G is a random variable Y that is
G-measurable, lies in L1, and satisfies
∫GYdP=∫GXdPfor every G∈G.(1)
Any such Y is written E[X∣G].
The two demands pull in opposite directions. The requirement of G-measurability coarsens
Y to the resolution of G, while Equation (1) forces it to reproduce the averages
of X over every G-event. We show below that exactly one random variable, up to a null
set, meets both.
For every X∈L1(P) a conditional expectation E[X∣G] exists and is unique up to
P-almost-everywhere equality.
Proof
Take first X≥0. Define a measure on (Ω,G) by ν(G)=∫GXdP. It is
finite, with ν(Ω)=E[X]<∞, and absolutely continuous with respect to the
restriction P↾G, since G∈G with P(G)=0 forces ∫GXdP=0. The
Radon-Nikodym theorem applied on (Ω,G,P↾G) supplies a
G-measurable Y≥0 with ν(G)=∫GYdP for all G∈G, which is exactly
Equation (1). For general X set E[X∣G]=E[X+∣G]−E[X−∣G], both
terms finite since X±≤∣X∣∈L1. For uniqueness, if Y1,Y2 both satisfy
Equation (1), the G-set G={Y1>Y2} gives ∫G(Y1−Y2)dP=0 with a
nonnegative integrand, so P(G)=0, and symmetrically, hence Y1=Y2 almost surely.
For X∈L2(P) the conditional expectation E[X∣G] is the orthogonal projection of
X onto the closed subspace L2(Ω,G,P); equivalently it is the G-measurable
minimizer of E[(X−Y)2].
Proof
The space L2(Ω,G,P) is complete, being L2 of the measure space
(Ω,G,P↾G), hence a closed subspace of L2(P) (when G is not
P-complete one works with the completion Gˉ, and L2(Gˉ)=L2(G) inside
L2(P) since each Gˉ-measurable function agrees a.s. with a G-measurable one). So
the projection Y exists and is characterized by X−Y⊥L2(G), that is
E[(X−Y)Z]=0 for every Z∈L2(G). In particular for Z=1G, bounded and hence in
L2 on a finite measure space, this recovers Equation (1), so Y=E[X∣G] by
Theorem 2. The minimization statement follows because an
orthogonal projection minimizes the distance from X to the subspace.
Statement (i) holds because the right side is G-measurable and integrates correctly over
each G∈G by linearity of the integral, so Theorem 2 identifies it. For
(ii), the inner variable W=E[X∣G] lies in L1, so E[W∣H] is defined.
The right side E[X∣H] is H-measurable, and for
H∈H⊆G, since H∈G as well,
∫HE[X∣H]dP=∫HXdP=∫HWdP by Equation (1) applied
at each level. By uniqueness Theorem 2 identifies E[X∣H] with
E[W∣H]. For (iii), the claim
holds for Z=1G0 with G0∈G because for G∈G,
and it extends to general bounded G-measurable Z by dominated convergence. In detail,
with ∣Z∣≤M pick simple G-measurable Zn with ∣Zn∣≤M and Zn→Z
pointwise; then ∣ZnX∣≤M∣X∣∈L1 and ZnX→ZX almost surely, so the
ordinary dominated convergence theorem gives ∥ZnX−ZX∥1→0, and since conditional
expectation is an L1-contraction we get
∥E[ZnX∣G]−E[ZX∣G]∥1≤∥ZnX−ZX∥1→0. Hence
E[ZnX∣G]→E[ZX∣G] in L1, so along a subsequence almost surely, while
ZnE[X∣G]→ZE[X∣G] almost surely; equating limits gives the claim. For (iv), if
X≥0 a.s. set G={E[X∣G]<0}∈G; then ∫GE[X∣G]dP=∫GXdP≥0
while the integrand is negative on G, forcing P(G)=0.
Theorem5
(Conditional Jensen.) If φ:R→R is convex and X,φ(X)∈L1(P), then
almost surely φ(E[X∣G])≤E[φ(X)∣G].
Proof
At each rational q a convex φ has a subgradient sq, giving the affine minorant
ℓq(x)=φ(q)+sq(x−q)≤φ(x); continuity of φ and density of Q then
yield supqℓq(x)=φ(x) for every x∈R, a supremum over a countable family. For
each q, monotonicity (iv) and linearity (i) from Proposition 4, together with
E[b∣G]=b for the constant b (immediate from Theorem 2, since a constant
is G-measurable and reproduces its own averages), give
E[φ(X)∣G]≥E[ℓq(X)∣G]=sqE[X∣G]+(φ(q)−sqq)=ℓq(E[X∣G])
almost surely. Taking the supremum over the countable family, a null set at a time, yields
E[φ(X)∣G]≥supqℓq(E[X∣G])=φ(E[X∣G])[1].
Conditional expectation is therefore both a density, by Theorem 2, and a
projection, by Proposition 3, and the averaging identity
Equation (1) is the common root of every rule above.
[1]
D. Williams, Probability with Martingales. Cambridge University Press, 1991.