We order the modes of convergence of random variables, prove the Markov and Chebyshev inequalities and the first Borel-Cantelli lemma, then prove a strong law of large numbers under a fourth-moment hypothesis by summability of fourth moments. The central limit theorem follows from a characteristic-function expansion together with the Levy continuity theorem.
Sample averages obey two laws of a different character. One is almost sure and fixes the
limit at the mean, and one is in distribution and fixes the fluctuations as Gaussian. Both
rest on tail bounds that convert a moment into a probability.
(i)(ii)(iii)(iv)almost surely, if P(Xn→X)=1;in probability, if P(∣Xn−X∣>ε)→0for each ε>0;in Lp, if E[∣Xn−X∣p]→0;in distribution, if E[g(Xn)]→E[g(X)]for every bounded continuous g.(1)
Convergence almost surely and convergence in Lp each imply convergence in probability,
which in turn implies convergence in distribution; no other implication holds in general
[1].
(Markov and Chebyshev.) For a nonnegative random variable Z and a>0, P(Z≥a)≤E[Z]/a. Consequently, for any X∈L2 and ε>0,
P(∣X−EX∣≥ε)≤ε2Var(X).(2)
Proof
Since a1{Z≥a}≤Z pointwise, taking expectations gives aP(Z≥a)≤E[Z]. Applying this to Z=(X−EX)2 and a=ε2
gives P(∣X−EX∣≥ε)=P((X−EX)2≥ε2)≤E[(X−EX)2]/ε2, which is Equation (2).
Lemma3
(First Borel-Cantelli.) If ∑nP(An)<∞, then P(An infinitely often)=0.
Proof
Let N=∑n1An. Monotone convergence gives
E[N]=∑nP(An)<∞, so N<∞ almost surely, which is exactly that only finitely
many An occur.
Let X1,X2,… be i.i.d. with E[X1]=μ and E[X14]<∞. Then Sn/n→μ
almost surely, where Sn=∑i=1nXi.
Proof
Since E[X14]<∞, power-mean monotonicity gives E[∣X1∣k]<∞ for all
k≤4; in particular E[X12]<∞, so (E[X12])2<∞. Replacing Xi
by Xi−μ, assume μ=0; the centered fourth moment stays finite, since
E∣X1−μ∣4≤8(E∣X1∣4+μ4)<∞ by the cr-inequality. Expanding
Sn4=∑i,j,k,lXiXjXkXl and taking expectations, independence and E[Xi]=0
annihilate every term with an index appearing exactly once. Only the n terms E[Xi4]
and the 3n(n−1) terms E[Xi2]E[Xj2] with i=j survive, so
E[Sn4]=nE[X14]+3n(n−1)(E[X12])2≤Cn2(3)
for a constant C. Hence E[(Sn/n)4]≤C/n2. Let TN=∑n≤N(Sn/n)4; the
TN are nonnegative and increase to T=∑n(Sn/n)4, so monotone convergence gives
E[T]=limN∑n≤NE[(Sn/n)4]=∑nE[(Sn/n)4]≤C∑nn−2<∞. A random
variable with finite expectation is finite almost surely, hence T=∑n(Sn/n)4<∞ almost
surely. The terms of a convergent series vanish, giving Sn/n→0 almost surely.
Write φX(t)=E[eitX] for the characteristic function. It determines the law, is
uniformly continuous, and factorizes over independent sums, with φX+Y=φXφY for
independent X,Y.
Theorem5
Let X1,X2,… be i.i.d. with E[X1]=0 and Var(X1)=σ2∈(0,∞). Then
Sn/(σn) converges in distribution to the standard normal.
Proof
Normalize to σ=1 and set φ=φX1. From the bound
eisx−(1+isx−21s2x2)≤min(∣sx∣3,∣sx∣2), taking expectations
gives φ(s)−(1+isEX1−21s2EX12)≤E[min(∣sX1∣3,∣sX1∣2)]=o(s2),
the last step by dominated convergence (the integrand divided by s2 is bounded by
X12∈L1 and tends to 0 pointwise). With EX1=0 and EX12=1 this is the
expansion φ(s)=1−21s2+o(s2) as s→0, requiring only ∣X1∣2∈L1.
Independence and identical distribution give
φSn/n(t)=φ(t/n)n=(1−2nt2+o(n−1))n,(4)
and for complex zn→z one has (1+zn/n)n→ez, since
nlog(1+zn/n)=zn+O(∣zn∣2/n)→z on the principal branch once ∣zn∣/n<1. With
zn=−t2/2+o(1)→−t2/2 this yields φSn/n(t)→e−t2/2 for every t. The limit is the characteristic
function of the standard normal and is continuous at the origin, so the Levy continuity
theorem upgrades pointwise convergence of characteristic functions to convergence in
distribution [1], [2].
The theorem is indifferent to the summand law. Standardized means of skewed
Exponential(1) increments approach the standard normal, and the code below
measures the residual excess kurtosis as a finite-n diagnostic.
import numpy as npfrom numpy.random import Generatordef standardized_means(n: int, paths: int, rng: Generator) -> np.ndarray: """Standardized sample means of i.i.d. Exponential(1) increments. Args: n: Number of summands per trajectory. paths: Number of independent trajectories. rng: Seeded generator for reproducibility. Returns: The array sqrt(n) * (mean - 1) per trajectory, converging in distribution to the standard normal as n grows. """ samples = rng.exponential(1.0, size=(paths, n)) return np.sqrt(n) * (samples.mean(axis=1) - 1.0)rng = np.random.default_rng(0)z = standardized_means(n=2_000, paths=200_000, rng=rng)excess_kurtosis = float(((z - z.mean()) ** 4).mean() / z.var() ** 2 - 3.0)
The Chebyshev bound Equation (2) and the strong law Theorem 4 pin the
average to its mean, while Theorem 5 resolves the residual fluctuation at the scale
n.
[1]
R. Durrett, Probability: Theory and Examples, 5th ed. Cambridge University Press, 2019.
[2]
D. Williams, Probability with Martingales. Cambridge University Press, 1991.