Skip to content
homeaboutworkprojectsthesiswritingresume
Loading
~/blog/itos-formula0%dark
  1. home/
  2. writing/
  3. Ito's Formula

03 June 2026 · 8 min read · updated 13 June 2026

Ito's Formula

Differentiating a function of Brownian motion produces an extra term that ordinary calculus does not, because the quadratic variation of Brownian motion is not negligible. We prove Ito's formula for a twice continuously differentiable function of a Brownian motion by a second-order Taylor expansion whose squared increments converge to the quadratic variation, extend it to Ito processes and to time-dependent functions, derive the integration-by-parts rule from it, and solve geometric Brownian motion. This is the differential calculus the stochastic models of finance are written in.

  • 7 equations
  • 8 results
  • 10 connections
  • stochastic-processes
  • stochastic-calculus
  • brownian-motion
On this page▾
  • Ito's formula for Brownian motion
  • The general formula
  • Integration by parts and geometric Brownian motion

8 min left

  • Ito's formula for Brownian motion3m
  • The general formula3m
  • Integration by parts and geometric Brownian motion2m

The chain rule for a smooth function of a smooth path comes from a first-order Taylor expansion, because the second-order term is of order (dt)2(dt)^2(dt)2 and vanishes. For a function of Brownian motion the second-order term does not vanish, because the quadratic variation makes the squared increment of order dtdtdt rather than (dt)2(dt)^2(dt)2, and the surviving term is the signature of stochastic calculus. Ito's formula is the resulting chain rule. This post proves it from the second-order Taylor expansion, then derives the integration-by-parts rule and solves geometric Brownian motion [1], [2]. Here WWW is a standard Brownian motion.

#Ito's formula for Brownian motion

Theorem1

Let fff be twice continuously differentiable with bounded first and second derivatives. Then for every ttt, almost surely,

f(Wt)=f(W0)+∫0tf′(Ws) dWs+12∫0tf′′(Ws) ds.(1)f(W_t)=f(W_0)+\int_0^t f'(W_s)\,dW_s+\frac12\int_0^t f''(W_s)\,ds. \tag{1}f(Wt​)=f(W0​)+∫0t​f′(Ws​)dWs​+21​∫0t​f′′(Ws​)ds.(1)
Proof

Fix ttt and a sequence of partitions 0=t0<⋯<tm=t0=t_0<\cdots<t_m=t0=t0​<⋯<tm​=t with mesh tending to 000, and write Δi=Wti−Wti−1\Delta_i=W_{t _i}-W_{t_{i-1}}Δi​=Wti​​−Wti−1​​. Telescoping and the second-order Taylor expansion with Lagrange remainder give, for each interval, a point ξi\xi_iξi​ between Wti−1W_{t_{i-1}}Wti−1​​ and WtiW_{t_i}Wti​​ with

f(Wt)−f(W0)=∑i(f′(Wti−1)Δi+12f′′(Wti−1)Δi2)+12∑i(f′′(ξi)−f′′(Wti−1))Δi2,(2)f(W_t)-f(W_0)=\sum_i\Big(f'(W_{t_{i-1}})\Delta_i+\tfrac12 f''(W_{t_{i-1}})\Delta_i^2\Big)+\tfrac12\sum_i\big( f''(\xi_i)-f''(W_{t_{i-1}})\big)\Delta_i^2, \tag{2}f(Wt​)−f(W0​)=i∑​(f′(Wti−1​​)Δi​+21​f′′(Wti−1​​)Δi2​)+21​i∑​(f′′(ξi​)−f′′(Wti−1​​))Δi2​,(2)

splitting the second-order term into its value at the left endpoint and a remainder.

The first-order sum. The integrand f′(Ws)f'(W_s)f′(Ws​) is adapted and continuous, so the left-endpoint sums ∑if′(Wti−1)Δi\sum_i f'(W_{t_{i-1}})\Delta_i∑i​f′(Wti−1​​)Δi​ converge in L2L^2L2 to the stochastic integral ∫0tf′(Ws) dWs\int_0^t f'(W_s)\,dW_s∫0t​f′(Ws​)dWs​, the simple integrands f′(Wti−1)1(ti−1,ti]f'(W_{t_{i-1}})\mathbf 1_{(t_{i-1},t_i]}f′(Wti−1​​)1(ti−1​,ti​]​ approximating f′(W⋅)f'(W_\cdot)f′(W⋅​) in the integral norm.

The second-order sum. Compare ∑if′′(Wti−1)Δi2\sum_i f''(W_{t_{i-1}})\Delta_i^2∑i​f′′(Wti−1​​)Δi2​ with the Riemann sum ∑if′′(Wti−1)(ti−ti−1)\sum_i f''(W_{t_{i -1}})(t_i-t_{i-1})∑i​f′′(Wti−1​​)(ti​−ti−1​), which converges to ∫0tf′′(Ws) ds\int_0^t f''(W_s)\,ds∫0t​f′′(Ws​)ds because f′′(W⋅)f''(W_\cdot)f′′(W⋅​) is continuous. The difference is ∑if′′(Wti−1)(Δi2−(ti−ti−1))\sum_i f''(W_{t_{i-1}})\big(\Delta_i^2-(t_i-t_{i-1})\big)∑i​f′′(Wti−1​​)(Δi2​−(ti​−ti−1​)), and its second moment is, since the increments Δi2−(ti−ti−1)\Delta_i^2-(t_i-t_{i-1})Δi2​−(ti​−ti−1​) are mean zero and independent of the past,

E[(∑if′′(Wti−1)(Δi2−(ti−ti−1)))2]=∑iE[f′′(Wti−1)2] 2(ti−ti−1)2≤2∥f′′∥∞2 t⋅mesh,(3)\E\Big[\Big(\sum_i f''(W_{t_{i-1}})\big(\Delta_i^2-(t_i-t_{i-1})\big)\Big)^2\Big]=\sum_i\E\big[f''(W_{t_{i-1 }})^2\big]\,2(t_i-t_{i-1})^2\le 2\norm{f''}_\infty^2\,t\cdot\text{mesh}, \tag{3}E[(i∑​f′′(Wti−1​​)(Δi2​−(ti​−ti−1​)))2]=i∑​E[f′′(Wti−1​​)2]2(ti​−ti−1​)2≤2∥f′′∥∞2​t⋅mesh,(3)

the cross terms vanishing by the same mean-zero-and-independence property. So the difference tends to 000 in L2L^2L2, and ∑if′′(Wti−1)Δi2→∫0tf′′(Ws) ds\sum_i f''(W_{t_{i-1}})\Delta_i^2\to\int_0^t f''(W_s)\,ds∑i​f′′(Wti−1​​)Δi2​→∫0t​f′′(Ws​)ds.

The remainder. On the path, WWW is continuous on [0,t][0,t][0,t] with compact range, so f′′f''f′′ is uniformly continuous there, and ∣ξi−Wti−1∣≤∣Δi∣\abs{\xi_i-W_{t_{i-1}}}\le\abs{\Delta_i}∣ξi​−Wti−1​​∣≤∣Δi​∣ tends to 000 uniformly in iii as the mesh shrinks. Hence max⁡i∣f′′(ξi)−f′′(Wti−1)∣→0\max_i\abs{f''(\xi_i)-f''(W_{t_{i-1}})}\to 0maxi​∣f′′(ξi​)−f′′(Wti−1​​)∣→0 pathwise, while along the subsequence on which ∑iΔi2→t\sum_i\Delta_i^2\to t∑i​Δi2​→t almost surely the sum ∑iΔi2\sum_i\Delta_i^2∑i​Δi2​ stays bounded, so the remainder is at most max⁡i∣f′′(ξi)−f′′(Wti−1)∣∑iΔi2→0\max_i\abs{f''(\xi_i)-f''(W_{t_{i-1} })}\sum_i\Delta_i^2\to 0maxi​∣f′′(ξi​)−f′′(Wti−1​​)∣∑i​Δi2​→0 almost surely along that subsequence.

Each piece converges in probability, so along a subsequence of partitions all converge almost surely, and passing to the limit in Equation (2) yields Equation (1) almost surely at the fixed ttt. Applying this countably often establishes the identity on a fixed countable dense set of ttt on one null set. The indefinite Ito integral Mt=∫0tf′(Ws) dWsM_t=\int_0^t f'(W_s)\,dW_sMt​=∫0t​f′(Ws​)dWs​ has a continuous modification, the standard property of the stochastic integral of an L2L^2L2 adapted integrand, and ∫0tf′′(Ws) ds\int_0^t f''(W_s)\,ds∫0t​f′′(Ws​)ds is continuous in ttt pathwise, so both sides are continuous in ttt and the identity extends from the dense set to all ttt off that null set.

The differential shorthand for Equation (1) is df(Wt)=f′(Wt) dWt+12f′′(Wt) dtdf(W_t)=f'(W_t)\,dW_t+\tfrac12 f''(W_t)\,dtdf(Wt​)=f′(Wt​)dWt​+21​f′′(Wt​)dt, the extra half-times-second-derivative term being the entire content of the formula.

#The general formula

The same expansion applies to an Ito process, a process of the form

Xt=X0+∫0tbs ds+∫0tσs dWs,(4)X_t=X_0+\int_0^t b_s\,ds+\int_0^t\sigma_s\,dW_s, \tag{4}Xt​=X0​+∫0t​bs​ds+∫0t​σs​dWs​,(4)

with progressively measurable integrands b∈Lloc1b\in\mathcal L^1_{loc}b∈Lloc1​ and σ∈Lloc2\sigma\in\mathcal L^2_{loc}σ∈Lloc2​, that is ∫0t∣bs∣ ds<∞\int_0^t\abs{b_s}\,ds<\infty∫0t​∣bs​∣ds<∞ and ∫0tσs2 ds<∞\int_0^t\sigma_s^2\,ds<\infty∫0t​σs2​ds<∞ almost surely for every ttt, so that both integrals exist, whose quadratic variation is ⟨X⟩t=∫0tσs2 ds\qv X_t=\int_0^t\sigma_s^2\,ds⟨X⟩t​=∫0t​σs2​ds since only the stochastic part contributes. Writing dXt=bt dt+σt dWtdX_t=b_t\,dt+\sigma_t\,dW_tdXt​=bt​dt+σt​dWt​, the squared increment obeys the multiplication rule dXt2=σt2 dtdX_t^2=\sigma_t^2\,dtdXt2​=σt2​dt, which abbreviates d⟨X⟩t=σt2 dtd\qv X_t=\sigma_t^2\,dtd⟨X⟩t​=σt2​dt, the only second-order quantity that appears. This holds because the drift ∫b ds\int b\,ds∫bds has finite variation, hence zero quadratic variation and zero covariation with the martingale part, while ⟨∫σ dW⟩t=∫0tσs2 ds\qv{\int\sigma\,dW}_t=\int_0^t\sigma_s^2\,ds⟨∫σdW⟩t​=∫0t​σs2​ds; the symbolic covariation identities dW dW=dtdW\,dW=dtdWdW=dt, dW dt=0dW\,dt=0dWdt=0, dt dt=0dt\,dt=0dtdt=0 are the mnemonic for this computation.

Theorem2

For an Ito process XXX and a function f(t,x)f(t,x)f(t,x) with fff, ∂tf\partial_t f∂t​f, ∂xf\partial_x f∂x​f, and ∂xxf\partial_{xx} f∂xx​f continuous,

f(t,Xt)=f(0,X0)+∫0t(∂tf+bs ∂xf+12σs2 ∂xxf)ds+∫0tσs ∂xf dWs,(5)f(t,X_t)=f(0,X_0)+\int_0^t\Big(\partial_t f+b_s\,\partial_x f+\tfrac12\sigma_s^2\,\partial_{xx}f\Big)ds+ \int_0^t\sigma_s\,\partial_x f\,dW_s, \tag{5}f(t,Xt​)=f(0,X0​)+∫0t​(∂t​f+bs​∂x​f+21​σs2​∂xx​f)ds+∫0t​σs​∂x​fdWs​,(5)

the partial derivatives evaluated at (s,Xs)(s,X_s)(s,Xs​).

Proof

Split each step as f(ti,Xti)−f(ti−1,Xti−1)=[f(ti,Xti)−f(ti−1,Xti)]+[f(ti−1,Xti)−f(ti−1,Xti−1)]f(t_i,X_{t_i})-f(t_{i-1},X_{t_{i-1}})=[f(t_i,X_{t_i})-f(t_{i-1},X_{t_i})]+[f(t_{i-1},X_{t_i})-f(t_{i-1},X_{t_{i-1}})]f(ti​,Xti​​)−f(ti−1​,Xti−1​​)=[f(ti​,Xti​​)−f(ti−1​,Xti​​)]+[f(ti−1​,Xti​​)−f(ti−1​,Xti−1​​)]. A first-order expansion in time of the first bracket contributes ∂tf Δti\partial_t f\,\Delta t_i∂t​fΔti​ by the mean value theorem and continuity of ∂tf\partial_t f∂t​f, and a second-order expansion in space of the second bracket contributes ∂xf ΔXi\partial_x f\,\Delta X_i∂x​fΔXi​ and 12∂xxf ΔXi2\tfrac12\partial_{xx}f\,\Delta X_i^221​∂xx​fΔXi2​. This uses only the assumed ∂tf,∂xf,∂xxf\partial_t f,\partial_x f,\partial_{xx}f∂t​f,∂x​f,∂xx​f. The increment ΔXi\Delta X_iΔXi​ splits into its drift part, which contributes ∫b ∂xf ds\int b\,\partial_x f\,ds∫b∂x​fds, and its diffusion part, which contributes ∫σ ∂xf dW\int\sigma\,\partial_x f\,dW∫σ∂x​fdW.

For the second-order space term, write ΔXi=bti−1Δti+σti−1ΔWi\Delta X_i=b_{t_{i-1}}\Delta t_i+\sigma_{t_{i-1}}\Delta W_iΔXi​=bti−1​​Δti​+σti−1​​ΔWi​ up to the integral errors, so

ΔXi2=σti−12ΔWi2+2bti−1σti−1Δti ΔWi+bti−12(Δti)2.(6)\Delta X_i^2=\sigma_{t_{i-1}}^2\Delta W_i^2+2b_{t_{i-1}}\sigma_{t_{i-1}}\Delta t_i\,\Delta W_i+b_{t_{i-1}}^2(\Delta t_i)^2. \tag{6}ΔXi2​=σti−1​2​ΔWi2​+2bti−1​​σti−1​​Δti​ΔWi​+bti−1​2​(Δti​)2.(6)

The first piece converges against ∂xxf\partial_{xx}f∂xx​f to ∫σs2 ∂xxf ds\int\sigma_s^2\,\partial_{xx}f\,ds∫σs2​∂xx​fds by replacing ΔWi2\Delta W_i^2ΔWi2​ with Δti\Delta t_iΔti​ at L2L^2L2 cost 2∑(σ2∂xxf)2(Δti)2→02\sum(\sigma^2\partial_{xx}f)^2(\Delta t_i)^2\to02∑(σ2∂xx​f)2(Δti​)2→0 as in Equation (3) followed by Riemann convergence; the cross piece is bounded by C(max⁡i∣ΔWi∣)∑iΔti→0C(\max_i\abs{\Delta W_i})\sum_i\Delta t_i\to0C(maxi​∣ΔWi​∣)∑i​Δti​→0 by continuity of WWW, and the last by C mesh∑iΔti→0C\,\text{mesh}\sum_i\Delta t_i\to0Cmesh∑i​Δti​→0. The time term sums to the Riemann integral ∫∂tf ds\int\partial_t f\,ds∫∂t​fds. Collecting the dsdsds and dWdWdW terms gives Equation (5). The unbounded-derivative case follows by stopping XXX when it leaves a large interval [−N,N][-N,N][−N,N], on which the continuous functions f,∂tf,∂xf,∂xxff,\partial_t f,\partial_x f,\partial_{xx}ff,∂t​f,∂x​f,∂xx​f are bounded over the compact [0,t]×[−N,N][0,t]\times[-N,N][0,t]×[−N,N], so the bounded-derivative estimates apply to the stopped process XNX^NXN; then letting N→∞N\to\inftyN→∞ by continuity of paths removes the boundedness assumption.

#Integration by parts and geometric Brownian motion

Applying the formula to a product gives the stochastic analogue of the Leibniz rule, with the extra covariation term.

Corollary3

For Ito processes XXX and YYY, d(XtYt)=Xt dYt+Yt dXt+d⟨X,Y⟩td(X_tY_t)=X_t\,dY_t+Y_t\,dX_t+d\qv{X,Y}_td(Xt​Yt​)=Xt​dYt​+Yt​dXt​+d⟨X,Y⟩t​, where d⟨X,Y⟩td\qv{X,Y}_td⟨X,Y⟩t​ is the covariation differential.

Proof

Polarise, using only the one-dimensional Theorem 2 applied to u↦u2u\mapsto u^2u↦u2 (with bounded-derivative localisation) along the single Ito processes X+YX+YX+Y, XXX, and YYY. This gives d((X+Y)2)=2(X+Y) d(X+Y)+d⟨X+Y⟩d((X+Y)^2)=2(X+Y)\,d(X+Y)+d\qv{X+Y}d((X+Y)2)=2(X+Y)d(X+Y)+d⟨X+Y⟩, d(X2)=2X dX+d⟨X⟩d(X^2)=2X\,dX+d\qv Xd(X2)=2XdX+d⟨X⟩, and d(Y2)=2Y dY+d⟨Y⟩d(Y^2)=2Y\,dY+d\qv Yd(Y2)=2YdY+d⟨Y⟩, and ⟨X+Y⟩=⟨X⟩+2⟨X,Y⟩+⟨Y⟩\qv{X+Y}=\qv X+2\qv{X,Y}+\qv Y⟨X+Y⟩=⟨X⟩+2⟨X,Y⟩+⟨Y⟩. Writing XY=12((X+Y)2−X2−Y2)XY=\tfrac12\big((X+Y)^2-X^2-Y^2\big)XY=21​((X+Y)2−X2−Y2) and subtracting,

d(XY)=12[2(X+Y) d(X+Y)−2X dX−2Y dY]+12[d⟨X+Y⟩−d⟨X⟩−d⟨Y⟩]=X dY+Y dX+d⟨X,Y⟩t,(7)d(XY)=\tfrac12\big[2(X+Y)\,d(X+Y)-2X\,dX-2Y\,dY\big]+\tfrac12\big[d\qv{X+Y}-d\qv X-d\qv Y\big]=X\,dY+Y\,dX+d\qv{X,Y}_t, \tag{7}d(XY)=21​[2(X+Y)d(X+Y)−2XdX−2YdY]+21​[d⟨X+Y⟩−d⟨X⟩−d⟨Y⟩]=XdY+YdX+d⟨X,Y⟩t​,(7)

which is the stated rule.

Corollary4

The geometric Brownian motion Xt=X0exp⁡((μ−12σ2)t+σWt)X_t=X_0\exp\big((\mu-\tfrac12\sigma^2)t+\sigma W_t\big)Xt​=X0​exp((μ−21​σ2)t+σWt​) solves dXt=μXt dt+σXt dWtdX_t=\mu X_t \,dt+\sigma X_t\,dW_tdXt​=μXt​dt+σXt​dWt​.

Proof

Apply Theorem 2 with X=WX=WX=W (bs=0b_s=0bs​=0, σs=1\sigma_s=1σs​=1, ⟨W⟩t=t\qv W_t=t⟨W⟩t​=t) to f(t,x)=X0exp⁡((μ−12σ2)t+σx)f(t,x)=X_0\exp((\mu-\tfrac12\sigma^2)t+\sigma x)f(t,x)=X0​exp((μ−21​σ2)t+σx) evaluated along WWW, so that Xt=f(t,Wt)X_t=f(t,W_t)Xt​=f(t,Wt​), noting ∂tf=(μ−12σ2)f\partial_t f=(\mu-\tfrac12\sigma^2)f∂t​f=(μ−21​σ2)f, ∂xf=σf\partial_x f=\sigma f∂x​f=σf, and ∂xxf=σ2f\partial_{xx}f=\sigma^2 f∂xx​f=σ2f. Substituting, the dsdsds coefficient is (μ−12σ2)f+12σ2f=μf(\mu-\tfrac12\sigma^2)f+\tfrac12\sigma^2 f=\mu f(μ−21​σ2)f+21​σ2f=μf and the dWdWdW coefficient is σf\sigma fσf, so dXt=μXt dt+σXt dWtdX_t=\mu X_t\,dt+\sigma X_t\,dW_tdXt​=μXt​dt+σXt​dWt​. The drift correction −12σ2-\tfrac12\sigma^2−21​σ2 in the exponent is exactly the Ito term, the reason the expected growth rate μ\muμ exceeds the median growth rate.

Ito's formula is the computational heart of stochastic calculus, turning every smooth transformation of a process into another Ito process with an explicit drift and diffusion. The half-times-second-derivative correction is where the quadratic variation enters every calculation, and it is the reason an option's value satisfies a second-order partial differential equation and the reason a log-price drifts below its arithmetic mean. It is the tool through which the stochastic differential equations of the next post are solved and verified.

[1]
B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, 6th ed. Springer, 2003.
[2]
I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed. Springer, 1991.

Part 5 of 8 in Stochastic Calculus

← previousThe Stochastic Integralnext →Change of Measure and Girsanov's Theorem

Explore connections

see in the atlas →

related

  • Stochastic Differential Equations
  • The Ornstein-Uhlenbeck Process
  • Quadratic Variation

referenced by (3)

  • Quadratic Variation
  • Stochastic Differential Equations
  • The Black-Scholes Equation
cite
@misc{itos-formula,
  author = {Zac Kienzle},
  title  = {Ito's Formula},
  year   = {2026},
  month  = {06},
  url    = {https://zackienzle.com/blog/itos-formula}
}