Notes on Stochastic Processes (Joe Chang)

Ch. 1 – Markov Chains

1.1 Specifying and Simulating a Markov Chain

to specify a Markov chain, we need to know its

1.2 The Markov Property

a process X0,X1,X_0, X_1, … satisfies the Markov property if

P{Xn+1=in+1Xn=in,Xn1=in1,...,X0=i0}=P{Xn+1=in+1Xn=in}\mathbb{P}\{X_{n+1} = i_{n+1} | X_n=i_n, X_{n-1} = i_{n-1}, ..., X_0 = i_0 \} \newline = \mathbb{P}\{X_{n+1} = i_{n+1} | X_n = i_n\}

for all nn and all i0,,in+1Si_0, …, i_{n+1} \in S.

notes

notes

(i.e., dependent on the last r states)

1.3 Matrices

recall π0\pi_0 from 1.1. let πn\pi_n denote the distribution of the chain at time nn analogously, πn(i)=P{Xn=i}\pi_n(i) = \mathbb{P}\{X_n = i\}. consider both as row vectors.

suppose that the state space is finite: S={1,,N}S = \{1, …, N\}. by LOTP,

πn+1(j)=P{Xn+1=j}=i=1NP{Xn=i}P{Xn+1=jXn=1}i=1Nπn(i)P(i,j)=πn+1=πnP.\pi_{n+1}(j) = \mathbb{P}\{X_{n+1} = j\} \newline = \sum_{i=1}^{N}\mathbb{P}\{X_n=i\}\mathbb{P}\{X_{n+1} = j | X_n=1\} \newline \sum_{i=1}^{N} \pi_n(i)P(i,j) = \pi_{n+1} = \pi_nP.

notes

1.4 Basic Limit Theorem of Markov Chains

notes

1.5 Stationary Distribution

stationary distribution amounts to saying π=πP\pi = \pi P is satisfied, i.e.,

π(j)=iSπ(i)P(i,j)\pi(j) = \sum_{i \in S} \pi(i) P(i,j)

for all jSj \in S.

a Markov chain might have no stationary distribution, one stationary distribution, or infinitely many stationary distributions.

for subsets A,BA, B of the state space, define the probability flux from set AA into BB as

flux(A,B)=iAjBπ(i)P(i,j)\text{flux}(A, B) = \sum_{i \in A} \sum_{j\in B} \pi(i) P(i,j)

1.6 Irreducibility, Periodicity, Recurrence

Use Pi(A)\mathbb{P}_i(A) as shorthand for P{AX0=i}\mathbb{P}\{A | X_0 = i\}, and same for Ei\E_i.

Accessibility: for two states i,ji, j, we say that jj is accessible from ii if it is possible for the chain ever to visit state jj if the chain starts in state ii:

Pi{n=0{Xn=j}}>0.\P_i \{\cup_{n=0}^{\infty} \{X_n = j\} \} > 0.

Equivalently,

n=0Pn(i,j)=n=0{Xn=j}>0.\sum_{n=0}^{\infty}P^n (i, j) = \sum_{n=0}^{\infty}\{X_n = j \} > 0.

Communication: we say ii communicates with jj if ii is accessible from jj and jj is accessible from ii.

Irreducibility: the Markov chain is irreducible if all pairs of states communicate.

The relation “communicates with” is an equivalence relation; hence, the states space SS can be partitioned into “communicating classes” or “classes.”

The Basic Limit Theorem requires irreducibility and aperiodicity (see 1.5). Trivial examples why:

Period: Given a Markov chain {X0,X1,...}\{ X_0, X_1, ... \}, define the period of a state ii to be the greatest common divisor

di=gcd{n:Pn(i,i)>0}.d_i = \text{gcd} \{n : P^n (i, i) > 0 \}.

THEOREM: if the states ii and jj communicate, then di=djd_i = d_j.

The period of a state is a “class property.” In particular, all states in an irreducible Markov chain have the same period. Thus, we can speak of the period of a Markov chain if the Markov chain is irreducible.

An irreducible Markov chain is aperiodic if its period is 11, and periodic otherwise. (A sufficient but not necessary condition for an irreducible chain to be aperiodic is that there exists a state ii such that P(i,i)>0P(i,i) > 0.)

One more concept for the Basic Limit Theorem: recurrence. We will begin with showing recurrence, then showing that it is a class property. In particular, in an irreducible Markov chain, etiher all states are recurrent OR all states are transient.

The idea of recurrence: a state ii is recurrent if, starting from the state ii at time 0, the chain is sure to return to ii eventually. More precisely, define the first hitting time TiT_i of the state i by

Ti=inf{n>0:Xn=i}.T_i = \inf \{n>0 : X_n = i \}.

Recurrence: the state ii is recurrent if Pi{Ti<}=1\P_i \{ T_i < \infty \} = 1. If ii is not recurrent, it is called transient.

(note that accessibility could be defined: for distinct states iji \neq j, jj is accessible from ii iff Pi{Tj<}>0\P_i \{ T_j < \infty \} > 0.)

THEOREM: Let ii be a recurrent state, and suppose that jj is accessible from ii. Then all of the following hold:

We use the notation NiN_i for the total number of visits of the Markov chain to the sate ii:

Ni=n=0I{Xn=i}.N_i = \sum_{n=0}^{\infty} I \{X_n = i \}.

THEOREM: The state ii is recurrent iff Ei(Ni)=.\E_i(N_i) = \infty.

COROLLARY: If jj is transient, then limnPn(i,j)=0\lim_{n \rightarrow \infty} P^n(i,j) = 0 for all states ii.

Introducing stationary distributions.

PROP: Suppose a Markov chain has a stationary distribution π\pi. If the state jj is transient, then π(j)=0\pi(j) = 0.

COROLLARY: If an irreducible Markov chain has a stationary distribution, then the chain is recurrent.

Note that the converse for the above is not true. There are irreducible, recurrent Markov chains that do not have stationary distributions. For example, the simple symmetric random walk on the integers in one dimension is irreducible and recurrent but does not have a stationary distribution. By recurrence we have P0{T0<}=1\P_0 \{T_0 < \infty \} = 1, but also E0{T0}=\E_0 \{T_0 \} = \infty. The name for this kind of recurrence is null recurrence, i.e., a state ii is null recurrent if it is recurrent and Ei(Ti)=\E_i(T_i) = \infty. Otherwise, a recurrent state is positive recurrent, where Ei(Ti)<\E_i (T_i) < \infty.

Positive recurrence is also a class property: if a chain is irreducible, the chain is either transient, null recurrent, or positive recurrent. In fact, an irreducible chain has a stationary distribution iff it is positive recurrent.

1.7 Coupling

Example of coupling technique: consider a random graph on a given finite set of nodes, in which each pair of nodes is joined by an edge independently with probability pp. We could simulate a random graph as follows: for each pair of nodes i,ji, j generate a random number UijU[0,1]U_{ij} \sim U[0,1], and join nodes ii and jj with an edge if UijpU_{ij} \leq p.

How do we show that the probability of the resulting graph being connected is nondecreasing in pp, i.e., show that for p1<p2p_1 < p_2,

Pp1{graph connected}Pp2{graph connected}.\P_{p1} \{ \text{graph connected}\} \leq \P_{p2} \{ \text{graph connected} \}.

We could try to find an explicit function for the probability in terms of pp, which seems inefficient. How to formalize the intuition that this seems obvious?

An idea: show that corresponding events are ordered, i.e., if ABA \subset B then PAPB\P A \leq \P B.

Let’s make 2 events by making 2 random graphs, G1,G2G_1, G_2 on the same set of nodes. G1G_1 is constructed by having each possible edge appear with prob p1p_1, and for G2G_2, each edge present with prob p2.p_2. We can do this by using two sets of U[0,1]U[0,1] random variables: {Uij},{Vij}\{U_{ij}\}, \{V_{ij}\} for the first and second graph, respectively. Is it true that

{G1 connected}{G2 connected}?\{G_1 \text{ connected}\} \subset \{G_2 \text{ connected} \}?

No, since the two sets of r.v.s are independently generated.

A change: use the same random numbers for each graph. Then

{G1 connected}{G2 connected}\{G_1 \text{ connected}\} \subset \{G_2 \text{ connected} \}

becomes true. This establishes monotonicity of the probability being connected.

Conclusion: what characterizes a coupling argument? Generally, we show that the same set of random variables can be used to construct two different objects about which we want to make a probabilistic statement.

1.8 Proof of Basic Limit Theorem

The Basic Limit Theorem says that if an irreducible, aperiodic Markov chain has a stationary distribution π\pi, then for each initial distirbution π0\pi_0, as nn \rightarrow \infty we have πn(i)π(i)\pi_n (i) \rightarrow \pi(i) for all states ii.

(Note the wording “a stationary distribution”: assuming BLT true implies that an irreducible and aperiodic Markov chain cannot have two different stationary distributions)

Equivalently, let’s define a distance between probability distributions, called “total variation distance:”

DEFINITION: Let λ\lambda and μ\mu be two probability distributions on the set SS. Then the total variation distance λμ|| \lambda - \mu || is defined by

λμ=supAS[λ(A)μ(A)].|| \lambda - \mu || = \sup_{A \subset S} [ \lambda(A) - \mu(A) ].

PROP.: The total variation distance may also be expressed in the alternative forms

λμ=supAS[λ(A)μ(A)]=12iSλ(i)μ(i)=1iSmin{λ(i),μ(u)}.|| \lambda - \mu = \sup_{A \subset S} [ \lambda(A) - \mu(A) ] = \frac{1}{2} \sum_{i \in S} | \lambda(i) - \mu(i) | = 1 - \sum_{i \in S} \min \{ \lambda(i), \mu(u) \}.

We now introduce the coupling method. Let Y0,Y1,...Y_0, Y_1, ... be a Markov chain with the same probability transition matrix as X0,X1,...X_0, X_1, ..., but let Y0Y_0 have the initial distribution π\pi and X0X_0 have the initial distribution π0\pi_0. Note that {Yn}\{ Y_n \} is a stationary Markov chain with distribution π\pi for all nn. Let the YY chain be independent of the XX chain.

We want to show that, for large nn, the probabilistic behavior of XnX_n is close to that of YnY_n.

Define the coupling time TT to be the first time at which Xn=YnX_n = Y_n:

T=inf{n:Xn=Yn}.T = \inf \{ n : X_n = Y_n \}.

LEMMA: For all n we have

πnπP{T>n}.|| \pi_n - \pi || \leq \P \{ T > n \}.

Hence we need only show that P{T>n}0\P \{T > n \} \rightarrow 0, or equivalently, that P{T<}=1.\P \{ T < \infty \} = 1.

Consider the bivariate chain {Zn=(Xn,Yn):n0}.\{ Z_n = (X_n, Y_n) : n \geq 0 \}. Z0,Z1,...Z_0, Z_1, ... is clearly a Markov chain on the state space S×SS \times S. Since the XX and YY chains are independent, the probability transition matrix PZP_Z of chain ZZ can be written

PZ(ixiy,jxjy)=P(ix,jx)P(iy,jy).P_Z (i_x i_y, j_x j_y) = P(i_x , j_x) P(i_y, j_y).

ZZ has stationary distribution

πZ(ixyiy)=π(ix)π(iy).\pi_Z (i_xy i_y) = \pi(i_x) \pi(i_y).

We want to show P{T}.\P \{ T \leq \infty \}. So, in terms of the ZZ chain, we want to show that with probability one, the ZZ chain hits the “diagonal” {(j,j):jS}\{ (j,j) : j \in S\} in S×SS \times S in finite time. To do this, it is sufficient to show that the ZZ chain is irreducible and recurrent.

1.9 SLLN for Markov Chains

Ch. 2 – Markov Chains: Examples & Applications

Ch. 3 – MRFs and HMMs

Ch. 4 – Martingales

Ch. 5 – Brownian Motion

Ch. 6 – Diffusions and Stochastic Calculus

Ch. 7 – Likelihood Ratios

Ch. 8 – Extremes and Poisson Clumping