Probability is the measure theory of a space whose total mass is one. Every notion of the subject (a random variable, its distribution, its expectation, its variance) is a measure-theoretic object read through that normalisation, and the payoff is that the Lebesgue integral and the construction of measures carry over wholesale. This post fixes the language, the probability space and the random variable, identifies the law of a random variable as a measure on the line, and proves the elementary inequalities that the limit theorems run on [1], [2].
#The probability space
A probability space is a measure space with . The set is the sample space, the sigma-algebra is the collection of events, and the measure is the probability.
Everything proved for general measures applies. Monotonicity gives for , finite additivity and give the complement rule , and continuity of measures gives along monotone sequences of events, which justifies passing probabilities through monotone limits of events. Countable subadditivity, , is the union bound.
#Random variables and their laws
A random variable is a measurable function , meaning for every . Its law or distribution is the pushforward measure on the Borel sets of .
The law is a probability measure on .
Preimage commutes with all set operations, so gives , and for disjoint Borel sets the preimages are disjoint events, so countable additivity of transfers, . Finally .
The law lives on the line, independent of , and is determined by its values on the half-lines.
The distribution function of is .
A distribution function is nondecreasing and right-continuous with and . Conversely every such is the distribution function of a unique law.
Monotonicity of on the nested half-lines gives nondecreasing. Right-continuity is continuity of from above, as the sets decrease to , and the two limits are and by continuity along and , where finite measure (every set has mass at most ) permits continuity from above. For the converse, assigns each half-open interval the mass , a nonnegative set function on the half-open intervals. Finite additivity is immediate by telescoping, and right-continuity of upgrades this to countable additivity, so is a premeasure on the semiring of half-open intervals. Concretely, if , choose by right-continuity at so that ; the open intervals cover the compact , a finite subcover with finite additivity and monotonicity gives , and letting (right-continuity at ) and yields one inequality while finite additivity gives the reverse. The Caratheodory extension then carries the premeasure to a Borel measure, unique because the intervals are an intersection-closed generating system and the measures are finite (total mass ), so the pi-system uniqueness theorem applies, exactly as for Lebesgue measure.
So a random variable can be specified by a distribution function alone, without naming the probability space, and two random variables with the same law are interchangeable for any question about their values.
#Expectation
The expectation is the integral against , and it inherits linearity, monotonicity, and the convergence theorems from integration.
The expectation of an integrable random variable is . Its variance, when is square-integrable, is .
Computing seems to require the space , but the law suffices, because integrating a pushforward reduces to integrating against the pushed measure.
For any Borel function with integrable, . In particular and as a Stieltjes integral.
The identity is the definition of the law, so the claim holds for indicators . Linearity extends it to nonnegative simple functions, the monotone convergence theorem extends it to nonnegative measurable through an increasing approximation, and splitting extends it to integrable .
This is why a distribution alone determines every moment and every expectation of a function of .
#The elementary inequalities
Three inequalities underpin the limit theorems that follow. The first bounds the tail of a nonnegative variable by its mean.
For a nonnegative random variable and , .
The pointwise bound holds because the indicator is supported where . Taking expectations and using monotonicity, , and dividing by gives the claim.
For square-integrable with mean and any , .
Apply Theorem 8 to the nonnegative variable at level , giving .
Chebyshev is the engine of the weak law of large numbers, where it sends the probability of deviation to zero as variances average down. The third inequality controls convex transformations.
If is convex and and are integrable, then .
A convex function has a supporting line at , a slope with for all , given by any subgradient at . Substituting and taking expectations, the right side is , since , while the left side is , giving .
Jensen's inequality is why , why variance is nonnegative, and why the norms increase in , the monotonicity that orders the spaces of random variables. These tools assemble the basic picture. A random variable is a measurable function whose averages are integrals, and the rest of probability studies how those integrals behave under independence, limits, and conditioning. The square-integrable random variables in particular form the Hilbert space with inner product , the geometry in which covariance is an angle and conditional expectation is a projection.