Probability Spaces and Random Variables

Probability is the measure theory of a space whose total mass is one. Every notion of the subject (a random variable, its distribution, its expectation, its variance) is a measure-theoretic object read through that normalisation, and the payoff is that the Lebesgue integral and the construction of measures carry over wholesale. This post fixes the language, the probability space and the random variable, identifies the law of a random variable as a measure on the line, and proves the elementary inequalities that the limit theorems run on [1], [2].

#The probability space

Definition1

A probability space is a measure space $(\Omega,\mathcal F,\P)$ with $\P(\Omega)=1$ . The set $\Omega$ is the sample space, the sigma-algebra $\mathcal F$ is the collection of events, and the measure $\P$ is the probability.

Everything proved for general measures applies. Monotonicity gives $\P(A)\le\P(B)$ for $A\subseteq B$ , finite additivity and $\P(\Omega)=1$ give the complement rule $\P(A^c)=1-\P(A)$ , and continuity of measures gives $\P(A_n)\to\P(A)$ along monotone sequences of events, which justifies passing probabilities through monotone limits of events. Countable subadditivity, $\P(\bigcup_n A_n)\le\sum_n\P(A_n)$ , is the union bound.

#Random variables and their laws

Definition2

A random variable is a measurable function $X:\Omega\to\R$ , meaning $\{X\le x\}\in\mathcal F$ for every $x$ . Its law or distribution is the pushforward measure $\P_X(B)=\P(X^{-1}(B))=\P(X\in B)$ on the Borel sets of $\R$ .

Proposition3

The law $\P_X$ is a probability measure on $(\R,\mathcal B)$ .

Proof

Preimage commutes with all set operations, so $X^{-1}(\emptyset)=\emptyset$ gives $\P_X(\emptyset)=0$ , and for disjoint Borel sets $B_n$ the preimages $X^{-1}(B_n)$ are disjoint events, so countable additivity of $\P$ transfers, $\P_X(\bigcup_n B_n)=\P(\bigcup_n X^{-1}(B_n))=\sum_n\P(X^{-1}(B_n))=\sum_n\P_X(B_n)$ . Finally $\P_X(\R)=\P(\Omega)=1$ .

The law lives on the line, independent of $\Omega$ , and is determined by its values on the half-lines.

Definition4

The distribution function of $X$ is $F(x)=\P(X\le x)=\P_X((-\infty,x])$ .

Proposition5

A distribution function is nondecreasing and right-continuous with $\lim_{x\to-\infty}F(x)=0$ and $\lim_{x\to\infty}F(x)=1$ . Conversely every such $F$ is the distribution function of a unique law.

Proof

Monotonicity of $\P$ on the nested half-lines gives $F$ nondecreasing. Right-continuity is continuity of $\P_X$ from above, $F(x+1/n)=\P_X((-\infty,x+1/n])\to\P_X((-\infty,x])=F(x)$ as the sets decrease to $(-\infty,x]$ , and the two limits are $\P_X(\emptyset)=0$ and $\P_X(\R)=1$ by continuity along $(-\infty,-n]\downarrow\emptyset$ and $(-\infty,n]\uparrow\R$ , where finite measure (every set has mass at most $1$ ) permits continuity from above. For the converse, $F$ assigns each half-open interval the mass $\mu((a,b])=F(b)-F(a)$ , a nonnegative set function on the half-open intervals. Finite additivity is immediate by telescoping, and right-continuity of $F$ upgrades this to countable additivity, so $\mu$ is a premeasure on the semiring of half-open intervals. Concretely, if $(a,b]=\bigsqcup_n(a_n,b_n]$ , choose $\delta_n$ by right-continuity at $b_n$ so that $F(b_n+\delta_n)-F(b_n)<\epsilon 2^{-n}$ ; the open intervals $(a_n,b_n+\delta_n)$ cover the compact $[a+\delta,b]$ , a finite subcover with finite additivity and monotonicity gives $F(b)-F(a+\delta)\le\sum_n(F(b_n)-F(a_n))+\epsilon$ , and letting $\delta\to0$ (right-continuity at $a$ ) and $\epsilon\to0$ yields one inequality while finite additivity gives the reverse. The Caratheodory extension then carries the premeasure to a Borel measure, unique because the intervals are an intersection-closed generating system and the measures are finite (total mass $1$ ), so the pi-system uniqueness theorem applies, exactly as for Lebesgue measure.

So a random variable can be specified by a distribution function alone, without naming the probability space, and two random variables with the same law are interchangeable for any question about their values.

#Expectation

The expectation is the integral against $\P$ , and it inherits linearity, monotonicity, and the convergence theorems from integration.

Definition6

The expectation of an integrable random variable is $\E[X]=\int_\Omega X\,d\P$ . Its variance, when $X$ is square-integrable, is $\Var(X)=\E[(X-\E X)^2]=\E[X^2]-(\E X)^2$ .

Computing $\E[X]$ seems to require the space $\Omega$ , but the law suffices, because integrating a pushforward reduces to integrating against the pushed measure.

Theorem7

For any Borel function $g$ with $g(X)$ integrable, $\E[g(X)]=\int_\R g\,d\P_X$ . In particular $\E[X]=\int_\R x\,d\P_X(x)$ and $\E[X]=\int_\R x\,dF(x)$ as a Stieltjes integral.

Proof

The identity $\int_\Omega\mathbf 1_B(X)\,d\P=\P(X\in B)=\int_\R\mathbf 1_B\,d\P_X$ is the definition of the law, so the claim holds for indicators $g=\mathbf 1_B$ . Linearity extends it to nonnegative simple functions, the monotone convergence theorem extends it to nonnegative measurable $g$ through an increasing approximation, and splitting $g=g^+-g^-$ extends it to integrable $g$ .

This is why a distribution alone determines every moment and every expectation of a function of $X$ .

#The elementary inequalities

Three inequalities underpin the limit theorems that follow. The first bounds the tail of a nonnegative variable by its mean.

Theorem8

For a nonnegative random variable $X$ and $a>0$ , $\P(X\ge a)\le\E[X]/a$ .

Proof

The pointwise bound $a\,\mathbf 1_{\{X\ge a\}}\le X$ holds because the indicator is supported where $X\ge a$ . Taking expectations and using monotonicity, $a\,\P(X\ge a)\le\E[X]$ , and dividing by $a$ gives the claim.

Corollary9

For square-integrable $X$ with mean $\mu$ and any $a>0$ , $\P(\abs{X-\mu}\ge a)\le\Var(X)/a^2$ .

Proof

Apply Theorem 8 to the nonnegative variable $(X-\mu)^2$ at level $a^2$ , giving $\P(\abs{X-\mu}\ge a)=\P((X-\mu)^2\ge a^2)\le\E[(X-\mu)^2]/a^2=\Var(X)/a^2$ .

Chebyshev is the engine of the weak law of large numbers, where it sends the probability of deviation to zero as variances average down. The third inequality controls convex transformations.

Theorem10

If $\varphi:\R\to\R$ is convex and $X$ and $\varphi(X)$ are integrable, then $\varphi(\E X)\le\E[\varphi(X)]$ .

Proof

A convex function has a supporting line at $m=\E[X]$ , a slope $c$ with $\varphi(x)\ge\varphi(m)+c(x-m)$ for all $x$ , given by any subgradient at $m$ . Substituting $X$ and taking expectations, the right side is $\E[\varphi(m)+c(X-m)]=\varphi(m)+c(\E[X]-m)=\varphi(m)=\varphi(\E X)$ , since $\E[X]-m=0$ , while the left side is $\E[\varphi(X)]$ , giving $\varphi(\E X)\le\E[\varphi(X)]$ .

Jensen's inequality is why $\abs{\E X}\le\E\abs X$ , why variance is nonnegative, and why the $L^p$ norms $\E[\abs X^p]^{1/p}$ increase in $p$ , the monotonicity that orders the spaces of random variables. These tools assemble the basic picture. A random variable is a measurable function whose averages are integrals, and the rest of probability studies how those integrals behave under independence, limits, and conditioning. The square-integrable random variables in particular form the Hilbert space $L^2(\Omega,\mathcal F, \P)$ with inner product $\E[XY]$ , the geometry in which covariance is an angle and conditional expectation is a projection.

[1]

R. Durrett, Probability: Theory and Examples, 5th ed. in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.

[2]

D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Explore connections

see in the atlas

referenced by (6)

cite

@misc{probability-spaces,
  author = {Zac Kienzle},
  title  = {Probability Spaces and Random Variables},
  year   = {2026},
  month  = {05},
  url    = {https://zackienzle.com/blog/probability-spaces}
}

#The probability space

Definition1

#Random variables and their laws

Definition2

Proposition3

The law $\P_X$ is a probability measure on $(\R,\mathcal B)$ .

Proof

The law lives on the line, independent of $\Omega$ , and is determined by its values on the half-lines.

Definition4

The distribution function of $X$ is $F(x)=\P(X\le x)=\P_X((-\infty,x])$ .

Proposition5

A distribution function is nondecreasing and right-continuous with $\lim_{x\to-\infty}F(x)=0$ and $\lim_{x\to\infty}F(x)=1$ . Conversely every such $F$ is the distribution function of a unique law.

Proof

#Expectation

The expectation is the integral against $\P$ , and it inherits linearity, monotonicity, and the convergence theorems from integration.

Definition6

The expectation of an integrable random variable is $\E[X]=\int_\Omega X\,d\P$ . Its variance, when $X$ is square-integrable, is $\Var(X)=\E[(X-\E X)^2]=\E[X^2]-(\E X)^2$ .

Computing $\E[X]$ seems to require the space $\Omega$ , but the law suffices, because integrating a pushforward reduces to integrating against the pushed measure.

Theorem7

For any Borel function $g$ with $g(X)$ integrable, $\E[g(X)]=\int_\R g\,d\P_X$ . In particular $\E[X]=\int_\R x\,d\P_X(x)$ and $\E[X]=\int_\R x\,dF(x)$ as a Stieltjes integral.

Proof

This is why a distribution alone determines every moment and every expectation of a function of $X$ .

#The elementary inequalities

Three inequalities underpin the limit theorems that follow. The first bounds the tail of a nonnegative variable by its mean.

Theorem8

For a nonnegative random variable $X$ and $a>0$ , $\P(X\ge a)\le\E[X]/a$ .

Proof

Corollary9

For square-integrable $X$ with mean $\mu$ and any $a>0$ , $\P(\abs{X-\mu}\ge a)\le\Var(X)/a^2$ .

Proof

Apply Theorem 8 to the nonnegative variable $(X-\mu)^2$ at level $a^2$ , giving $\P(\abs{X-\mu}\ge a)=\P((X-\mu)^2\ge a^2)\le\E[(X-\mu)^2]/a^2=\Var(X)/a^2$ .

Chebyshev is the engine of the weak law of large numbers, where it sends the probability of deviation to zero as variances average down. The third inequality controls convex transformations.

Theorem10

If $\varphi:\R\to\R$ is convex and $X$ and $\varphi(X)$ are integrable, then $\varphi(\E X)\le\E[\varphi(X)]$ .

Proof

[1]

R. Durrett, Probability: Theory and Examples, 5th ed. in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.

[2]

D. Williams, Probability with Martingales. Cambridge University Press, 1991.

Explore connections

see in the atlas

referenced by (6)

cite

@misc{probability-spaces,
  author = {Zac Kienzle},
  title  = {Probability Spaces and Random Variables},
  year   = {2026},
  month  = {05},
  url    = {https://zackienzle.com/blog/probability-spaces}
}