Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/independence0%dark
  1. home/
  2. writing/
  3. Independence

19 May 2026 · 8 min read · updated 13 June 2026

Independence

Independence is the single hypothesis that lets random quantities be combined, and measure-theoretically it is the statement that a joint law factors into a product. We define independence of events, sigma-algebras, and random variables, prove it is equivalent to the joint law being the product of the marginals, prove independent integrable variables have factoring expectations through Fubini, prove the two Borel-Cantelli lemmas governing whether infinitely many events occur, and prove the Kolmogorov zero-one law that a tail event of an independent sequence has probability zero or one.

  • 1 equation
  • 13 results
  • 10 connections
  • probability
  • measure-theory
On this page▾
  • Independence of events and sigma-algebras
  • Independence and the product law
  • The Borel-Cantelli lemmas
  • The Kolmogorov zero-one law

8 min left

  • Independence of events and sigma-algebras1m
  • Independence and the product law2m
  • The Borel-Cantelli lemmas2m
  • The Kolmogorov zero-one law2m

Independence is what lets randomness compound. Two experiments that do not inform each other multiply their probabilities, and the entire apparatus of sums of random variables, limit theorems, and product constructions rests on that one factorisation. Reading it through measure theory connects independence directly to the Fubini theorem and makes the factorisation of expectations a calculation rather than an axiom. This post builds independence from events up to the Kolmogorov zero-one law, on the probability space of the previous post [1], [2].

#Independence of events and sigma-algebras

Definition1

Events A1,…,AnA_1,\dots,A_nA1​,…,An​ are independent when P(Ai1∩⋯∩Aik)=P(Ai1)⋯P(Aik)\P(A_{i_1}\cap\cdots\cap A_{i_k})=\P(A_{i_1})\cdots \P(A_{i_k})P(Ai1​​∩⋯∩Aik​​)=P(Ai1​​)⋯P(Aik​​) for every subcollection. Sub-sigma-algebras G1,…,Gn\mathcal G_1,\dots,\mathcal G_nG1​,…,Gn​ of F\mathcal FF are independent when P(A1∩⋯∩An)=P(A1)⋯P(An)\P(A_1\cap\cdots\cap A_n)=\P(A_1)\cdots\P(A_n)P(A1​∩⋯∩An​)=P(A1​)⋯P(An​) for all choices Ai∈GiA_i\in\mathcal G_iAi​∈Gi​. Random variables X1,…,XnX_1,\dots,X_nX1​,…,Xn​ are independent when the generated sigma-algebras σ(X1),…,σ(Xn)\sigma(X_1),\dots,\sigma(X_n)σ(X1​),…,σ(Xn​) are. An arbitrary family (of events, sigma-algebras, or random variables) is independent when every finite subfamily is, so that P(Ai1∩⋯∩Aik)=P(Ai1)⋯P(Aik)\P(A_{i_1}\cap\cdots\cap A_{i_k})=\P(A_{i_1}) \cdots\P(A_{i_k})P(Ai1​​∩⋯∩Aik​​)=P(Ai1​​)⋯P(Aik​​) for all finite {i1,…,ik}\{i_1,\dots,i_k\}{i1​,…,ik​} and Aij∈GijA_{i_j}\in\mathcal G_{i_j}Aij​​∈Gij​​.

Checking independence against every event of each sigma-algebra is unwieldy. It is enough to check a generating system closed under intersection.

Lemma2

If G\mathcal GG is independent of a collection P\mathcal PP that is closed under finite intersection, then G\mathcal GG is independent of σ(P)\sigma(\mathcal P)σ(P). Hence independence of sigma-algebras need only be checked on intersection-closed generating systems.

Proof

Fix G∈GG\in\mathcal GG∈G with P(G)>0\P(G)>0P(G)>0, the case P(G)=0\P(G)=0P(G)=0 being trivial. The collection D={B∈F:P(G∩B)=P(G)P(B)}\mathcal D=\{B\in \mathcal F:\P(G\cap B)=\P(G)\P(B)\}D={B∈F:P(G∩B)=P(G)P(B)} contains P\mathcal PP by hypothesis and contains Ω\OmegaΩ. It is closed under proper differences, since P(G∩(B∖B′))=P(G∩B)−P(G∩B′)=P(G)(P(B)−P(B′))\P(G\cap(B\setminus B'))=\P(G\cap B)-\P(G\cap B')=\P(G)(\P(B)- \P(B'))P(G∩(B∖B′))=P(G∩B)−P(G∩B′)=P(G)(P(B)−P(B′)) for B′⊆BB'\subseteq BB′⊆B, and under increasing limits by continuity of measures. So D\mathcal DD is a Dynkin system containing the pi-system P\mathcal PP, hence contains σ(P)\sigma(\mathcal P)σ(P) by the Dynkin theorem. Thus every G∈GG\in\mathcal GG∈G is independent of every set in σ(P)\sigma(\mathcal P)σ(P).

#Independence and the product law

For random variables, independence is exactly the factorisation of the joint law.

Theorem3

Random variables XXX and YYY are independent if and only if their joint law is the product of the marginals, P(X,Y)=PX⊗PY\P_{(X,Y)}=\P_X\otimes\P_YP(X,Y)​=PX​⊗PY​, equivalently P(X≤x,Y≤y)=FX(x)FY(y)\P(X\le x,Y\le y)=F_X(x)F_Y(y)P(X≤x,Y≤y)=FX​(x)FY​(y) for all x,yx,yx,y.

Proof

The sigma-algebra σ(X)\sigma(X)σ(X) is generated by the intersection-closed system PX={{X≤x}}\mathcal P_X=\{\{X\le x\}\}PX​={{X≤x}}, and likewise σ(Y)\sigma(Y)σ(Y) by PY={{Y≤y}}\mathcal P_Y=\{\{Y\le y\}\}PY​={{Y≤y}}. The rectangle identity P(X≤x,Y≤y)=P(X≤x)P(Y≤y)=FX(x)FY(y)\P(X\le x,Y\le y)= \P(X\le x)\P(Y\le y)=F_X(x)F_Y(y)P(X≤x,Y≤y)=P(X≤x)P(Y≤y)=FX​(x)FY​(y) for all x,yx,yx,y says PX\mathcal P_XPX​ is independent of PY\mathcal P_YPY​; Lemma 2 with G=PX\mathcal G=\mathcal P_XG=PX​, P=PY\mathcal P=\mathcal P_YP=PY​ gives PX\mathcal P_XPX​ independent of σ(Y)\sigma(Y)σ(Y), and a second application with G=σ(Y)\mathcal G=\sigma(Y)G=σ(Y), P=PX\mathcal P=\mathcal P_XP=PX​ gives σ(X)\sigma(X)σ(X) independent of σ(Y)\sigma(Y)σ(Y). So this identity is equivalent to the independence of XXX and YYY. It also says the joint law and the product measure PX⊗PY\P_X\otimes\P_YPX​⊗PY​ agree on the rectangles (−∞,x]×(−∞,y](-\infty,x]\times (-\infty,y](−∞,x]×(−∞,y], an intersection-closed generating system of the Borel sets of the plane. Both are probability measures of total mass 111, and the rectangles Rn=(−∞,n]×(−∞,n]R_n=(-\infty,n]\times(-\infty,n]Rn​=(−∞,n]×(−∞,n] lie in the system with Rn↑R2R_n\uparrow\R^2Rn​↑R2, so two finite measures agreeing on this exhausting generator agree on the generated sigma-algebra by Dynkin uniqueness. Hence they agree everywhere, P(X,Y)=PX⊗PY\P_{(X,Y)}=\P_X\otimes\P_YP(X,Y)​=PX​⊗PY​. Conversely, if P(X,Y)=PX⊗PY\P_{(X,Y)}=\P_X\otimes\P_YP(X,Y)​=PX​⊗PY​, evaluating both sides on the rectangle (−∞,x]×(−∞,y](-\infty,x]\times(-\infty,y](−∞,x]×(−∞,y] gives P(X≤x,Y≤y)=FX(x)FY(y)\P(X\le x,Y\le y)=F_X(x)F_Y(y)P(X≤x,Y≤y)=FX​(x)FY​(y), which is independence.

The product law turns the factorisation of expectations into an application of Fubini.

Theorem4

If XXX and YYY are independent and integrable, then XYXYXY is integrable and E[XY]=E[X] E[Y]\E[XY]=\E[X]\,\E[Y]E[XY]=E[X]E[Y].

Proof

By the change of variables on the pair (X,Y)(X,Y)(X,Y) and the product law, E[∣XY∣]=∫R2∣xy∣ dP(X,Y)=∫R2∣xy∣ d(PX⊗PY)\E[\abs{XY}]=\int_{\R^2}\abs{xy}\,d\P_{(X,Y)}=\int_{\R^2}\abs{xy}\,d(\P_X\otimes\P_Y)E[∣XY∣]=∫R2​∣xy∣dP(X,Y)​=∫R2​∣xy∣d(PX​⊗PY​), which the Tonelli theorem factors as (∫∣x∣ dPX)(∫∣y∣ dPY)=E∣X∣ E∣Y∣<∞\big(\int\abs x\,d\P_X\big) \big(\int\abs y\,d\P_Y\big)=\E\abs X\,\E\abs Y<\infty(∫∣x∣dPX​)(∫∣y∣dPY​)=E∣X∣E∣Y∣<∞, so XYXYXY is integrable. The Fubini theorem then factors the signed integral the same way, E[XY]=∫R2xy d(PX⊗PY)=E[X] E[Y]\E[XY]=\int_{\R^2}xy\,d(\P_X\otimes\P_Y)=\E[X]\,\E[Y]E[XY]=∫R2​xyd(PX​⊗PY​)=E[X]E[Y].

Factoring expectations makes the variance of a sum of independent variables additive, since the cross terms E[(Xi−EXi)(Xj−EXj)]\E[(X_i-\E X_i)(X_j-\E X_j)]E[(Xi​−EXi​)(Xj​−EXj​)] vanish. This additivity drives the law of large numbers.

#The Borel-Cantelli lemmas

For a sequence of events, the set on which infinitely many occur is the limit superior lim sup⁡nAn=⋂N⋃n≥NAn\limsup_n A_n= \bigcap_N\bigcup_{n\ge N}A_nlimsupn​An​=⋂N​⋃n≥N​An​. Two lemmas decide its probability from the sum ∑nP(An)\sum_n\P(A_n)∑n​P(An​).

Theorem5

If ∑nP(An)<∞\sum_n\P(A_n)<\infty∑n​P(An​)<∞, then P(lim sup⁡nAn)=0\P(\limsup_n A_n)=0P(limsupn​An​)=0.

Proof

For every NNN, monotonicity and countable subadditivity give P(lim sup⁡nAn)≤P(⋃n≥NAn)≤∑n≥NP(An)\P(\limsup_n A_n)\le\P(\bigcup_{n\ge N}A_n) \le\sum_{n\ge N}\P(A_n)P(limsupn​An​)≤P(⋃n≥N​An​)≤∑n≥N​P(An​). The right side is the tail of a convergent series, so it tends to 000 as N→∞N\to\inftyN→∞, forcing P(lim sup⁡nAn)=0\P(\limsup_n A_n)=0P(limsupn​An​)=0.

Theorem6

If the events AnA_nAn​ are independent and ∑nP(An)=∞\sum_n\P(A_n)=\infty∑n​P(An​)=∞, then P(lim sup⁡nAn)=1\P(\limsup_n A_n)=1P(limsupn​An​)=1.

Proof

It suffices to show P(⋃n≥NAn)=1\P(\bigcup_{n\ge N}A_n)=1P(⋃n≥N​An​)=1 for every NNN, equivalently P(⋂n≥NAnc)=0\P(\bigcap_{n\ge N}A_n^c)=0P(⋂n≥N​Anc​)=0. Each singleton {An}\{A_n\}{An​} is an intersection-closed system generating σ(An)={∅,An,Anc,Ω}\sigma(A_n)=\{\emptyset,A_n,A_n^c, \Omega\}σ(An​)={∅,An​,Anc​,Ω}, so Lemma 2 applied coordinatewise lifts the independence of the AnA_nAn​ to independence of the sigma-algebras σ(An)\sigma(A_n)σ(An​), and since Anc∈σ(An)A_n^c\in\sigma(A_n)Anc​∈σ(An​) the complements {Anc}\{A_n^c\}{Anc​} are independent. With the bound 1−p≤e−p1-p\le e^{-p}1−p≤e−p,

P(⋂n=NMAnc)=∏n=NM(1−P(An))≤exp⁡(−∑n=NMP(An)).(1)\P\Big(\bigcap_{n=N}^M A_n^c\Big)=\prod_{n=N}^M(1-\P(A_n))\le\exp\Big(-\sum_{n=N}^M\P(A_n)\Big). \tag{1}P(n=N⋂M​Anc​)=n=N∏M​(1−P(An​))≤exp(−n=N∑M​P(An​)).(1)

As M→∞M\to\inftyM→∞ the exponent diverges because ∑nP(An)=∞\sum_n\P(A_n)=\infty∑n​P(An​)=∞, so the left side, which decreases to P(⋂n≥NAnc)\P(\bigcap_{n\ge N}A_n^c)P(⋂n≥N​Anc​) by continuity from above, is 000. Hence P(⋃n≥NAn)=1\P(\bigcup_{n\ge N}A_n)=1P(⋃n≥N​An​)=1 for all NNN, and intersecting over NNN gives P(lim sup⁡nAn)=1\P(\limsup_n A_n)=1P(limsupn​An​)=1.

The two lemmas are a sharp dichotomy for independent events. The probability that infinitely many occur is 000 or 111 according as the sum of probabilities converges or diverges, with nothing in between.

#The Kolmogorov zero-one law

That dichotomy is an instance of a structural fact. Any event determined by the entire tail of an independent sequence, free of any finite initial segment, is deterministic.

Theorem7

Let X1,X2,…X_1,X_2,\dotsX1​,X2​,… be independent random variables and T=⋂nσ(Xn,Xn+1,… )\mathcal T=\bigcap_n\sigma(X_n,X_{n+1},\dots)T=⋂n​σ(Xn​,Xn+1​,…) the tail sigma-algebra. Every A∈TA\in\mathcal TA∈T has P(A)∈{0,1}\P(A)\in\{0,1\}P(A)∈{0,1}.

Proof

Fix mmm; we show the head block σ(X1,…,Xm)\sigma(X_1,\dots,X_m)σ(X1​,…,Xm​) and the tail block σ(Xm+1,Xm+2,… )\sigma(X_{m+1},X_{m+2},\dots)σ(Xm+1​,Xm+2​,…) are independent. The finite intersections P1={B1∩⋯∩Bm:Bi∈σ(Xi)}\mathcal P_1=\{B_1\cap \cdots\cap B_m:B_i\in\sigma(X_i)\}P1​={B1​∩⋯∩Bm​:Bi​∈σ(Xi​)} are closed under intersection (overlapping ranges merge, with absent factors set to Ω\OmegaΩ) and contain each σ(Xi)\sigma(X_i)σ(Xi​), i≤mi\le mi≤m, so they generate σ(X1,…,Xm)\sigma(X_1,\dots, X_m)σ(X1​,…,Xm​); likewise P2={Bm+1∩⋯∩Bm+k:k≥1, Bj∈σ(Xj)}\mathcal P_2=\{B_{m+1}\cap\cdots\cap B_{m+k}:k\ge1,\,B_j\in\sigma(X_j)\}P2​={Bm+1​∩⋯∩Bm+k​:k≥1,Bj​∈σ(Xj​)} is intersection-closed and contains each σ(Xj)\sigma(X_j)σ(Xj​), j>mj>mj>m, so it generates σ(Xm+1,Xm+2,… )\sigma(X_{m+1},X_{m+2},\dots)σ(Xm+1​,Xm+2​,…). The finite-dimensional independence of the sequence factors P\PP across any one set drawn from each system, so P1\mathcal P_1P1​ and P2\mathcal P_2P2​ are independent. Applying Lemma 2 with G=P1\mathcal G=\mathcal P_1G=P1​, P=P2\mathcal P=\mathcal P_2P=P2​ gives P1\mathcal P_1P1​ independent of σ(P2)\sigma(\mathcal P_2)σ(P2​), and a second application with G=σ(P2)\mathcal G=\sigma(\mathcal P_2)G=σ(P2​), P=P1\mathcal P= \mathcal P_1P=P1​ lifts this to independence of σ(X1,…,Xm)\sigma(X_1,\dots,X_m)σ(X1​,…,Xm​) and σ(Xm+1,Xm+2,… )\sigma(X_{m+1},X_{m+2},\dots)σ(Xm+1​,Xm+2​,…). Since T⊆σ(Xm+1,Xm+2,… )\mathcal T \subseteq\sigma(X_{m+1},X_{m+2},\dots)T⊆σ(Xm+1​,Xm+2​,…), it is independent of σ(X1,…,Xm)\sigma(X_1,\dots,X_m)σ(X1​,…,Xm​) for every mmm, hence of their union over mmm, an intersection-closed system generating σ(X1,X2,… )\sigma(X_1,X_2,\dots)σ(X1​,X2​,…). By Lemma 2 again, T\mathcal TT is independent of σ(X1,X2,… )\sigma(X_1,X_2,\dots)σ(X1​,X2​,…). But T⊆σ(X1,X2,… )\mathcal T\subseteq\sigma(X_1,X_2,\dots)T⊆σ(X1​,X2​,…). So T\mathcal TT is independent of itself. For A∈TA\in\mathcal TA∈T this means P(A)=P(A∩A)=P(A)P(A)\P(A)=\P(A\cap A)=\P(A)\P(A)P(A)=P(A∩A)=P(A)P(A), so P(A)=P(A)2\P(A)=\P(A)^2P(A)=P(A)2, whose only solutions are 000 and 111.

The zero-one law says the asymptotic behaviour of an independent sequence is never genuinely random. Events like the convergence of ∑nXn\sum_n X_n∑n​Xn​, or lim sup⁡nXn\limsup_n X_nlimsupn​Xn​ exceeding a threshold, depend on no finite block of the sequence and so are tail events, settled with certainty one way or the other. Independence is therefore both the hypothesis that lets variables combine and the source of the rigidity that makes their limits deterministic, the structure on which the law of large numbers and the convergence of random series are built.

[1]
R. Durrett, Probability: Theory and Examples, 5th ed. in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
[2]
O. Kallenberg, Foundations of Modern Probability, 3rd ed. Springer, 2021.

Part 2 of 9 in Probability

← previousProbability Spaces and Random Variablesnext →Characteristic Functions

Explore connections

see in the atlas →

related

  • Convergence and Limit Theorems
  • Gaussian Vectors and Processes
  • The Construction of Brownian Motion

referenced by (5)

  • Characteristic Functions
  • Gaussian Vectors and Processes
  • Probability Spaces and Random Variables
  • Quadratic Variation
  • The Construction of Brownian Motion
cite
@misc{independence,
  author = {Zac Kienzle},
  title  = {Independence},
  year   = {2026},
  month  = {05},
  url    = {https://zackienzle.com/blog/independence}
}