A notes for CSE 599: Counting and Sampling.
What is sampling: Assume there is a giant space \(\Omega\)(usually finite). There is a function \(w: \Omega \rightarrow \mathbb{R}_{+}\). And the unknown partition function \(Z = \sum_{x \in \Omega} w(x)\). We need to generate a sample \(x\) with probability \(\frac{w(s)}{Z}\).
Definition of \(\#P\): a function \(f \in \#P\) if and only if it counts accept paths of a NTM of a problem in \(NP\). Or equally count the evidences. So #CircuitSAT is #Pcomplete. And if #R is #Pcomplete, them R is in NPComplete.
Counting and sampling deals with some #P problems.
Definition of multiplicative error:
real value \(p\) and estimation \(\tilde{p}\) with multiplicative error \(1 + \epsilon\) means that
or equally
\(p\) with additive error \(\epsilon\) means that
Equivalence of Counting and Sampling
The two basic problem is FPRAS and FPAUS.
FPRAS
Fully polynomial randomized approximation scheme.
Definition: Given a set \(\Omega\) and a weight function \(w: \Omega \rightarrow \mathbb{R}_{+}\) and a partition function \(Z = \sum_{x \in \Omega'} w(x)\), the FPRAS is an algorithm with given error rate \(\epsilon\) and confidence interval \(\delta\) that return \(\tilde{Z}\) , s.t.
The algorithm must run in \(Poly(n, \frac{1}{\epsilon}, \log \frac{1}{\delta})\)
FPAUS
Fully polynomial almost uniform sampler
Before giving the formal definition, we have to define total variance.
Total variance
Supppose \(\mu, \nu: \Omega \rightarrow \mathbb{R}_{+}\) are two probability distribution.
The total variance is defined as
Or equally
Proof:
For the second formula, we can see that if we pick \(X = \{x \mu(x) > \nu(x) \}\), RHS is maximized.
So
Definition of FPAUS:
fully polynomial almost uniform sampler
There is a finite space \(\Omega\) and a weight functiton \(w: \Omega \rightarrow \mathbb{R}_{+}\), and a partition function \(Z = \sum_{x \in \Omega}w(x)\), \(\pi(x) = \frac{w(x)}{Z}\), fully polynomial almost uniform sampler is an algorithm with given error rate \(\delta\) generate a sample from distribution \(\nu\) s.t.
running in \(Poly(n, \log \frac{1}{\delta})\)
For selfreducible problems, it can be proved that FPRAS means FPAUS.
Matching counting
In this example we show that for matching counting problem FPRAS <=> FPAUS.
For FPAUS => FPRAS
Assume we have FPAUS of matching counting problem.
We mainly consider this decomposisiont
Denote the matching set of graph \(G\) as \(M(G)\)
Lemma 1
proof: consider an element of \(G_{i}\), it does not contain \(e_{i}\). We can generate a distinctive matching by make one node of \(e_{i}\) in other matching set. So
\(\blacksquare\)
Lemma 2
given a FPAUS of matching counting problem, that
then
proof:
Let \(A = \{x  x \in M(G_{i})\}\), A is a subset. From total variance's definition, it's trivial.
\(\blacksquare\)
We have to use lemma2 to calculate the real probability because FPAUS only generates a sample.
Theorem: Chernoff bound for \(X_{1}, X_{2}, \cdots, X_{n}\) i.i.d. \(0 \leq X_{i} \leq 1\). Define \(\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_{i}\). Then for any \(\alpha > 0\)
Lemma 2.5
With \(O(\frac{m^{2}}{\epsilon^{2}}\log \frac{m}{\delta})\) sample, we can approximate \(\bar{X}\) with additive error \(\frac{\epsilon}{20m}\) and confidence interval \(\frac{\delta}{m}\).
Use Lemma 2 and Lemma 2.5 we can approximate \(p_{i}\) with additive error \(\frac{\epsilon}{10m}\) and confidence interval \(\frac{\delta}{m}\)
Lemma 3
If we approximate \(p\) in \(1 + \frac{\epsilon}{4m}\) with error probability \(\delta\), then we can approximate \(\frac{1}{p}\) in \(1 + \frac{\epsilon}{2m}\)
proof:
similar for the other side.
Lemma 4
If we approximate \(p_{i}\) with additive error \(\epsilon\), then we can approximate it with multiplicative error \(1 + 2 \epsilon\)
proof:
\(\blacksquare\)
With lemma 2, lemma 2.5 and lemma 4, we can approximate \(\frac{1}{p_{i}}\) with multiplicative error \(1 + \frac{\epsilon}{5m}\) and confidence interval \(\frac{\delta}{m}\)
Use union bound, we can prove there is an FPRAS.
FPRAS => FPAUS
Suppose we have a exact counter, we can derive a exact sampler by conditional probability method.
Now try to extend to approximate area.
Denote the \(i\) th graph as \(G^{i}\), we can formulate the probability of a sample \(x\), \(e_{i} = (u_{i}, v_{i})\).
Using FPRAS, we can approximate \(M(.)\) with multiplicative error \(1 + \epsilon\), and confidence interval \(\delta\).
So, we can approximate \(\frac{1}{\mathbb{Pr}[x]}\) with multiplicative error \((1 + \epsilon)^{n^{2}}\) and confidence interval \(n^{2}\delta\)(union bound).
I think process can end at here, but the lecture brings out rejection sampling that I do not understand.
Once construct \(M\), we construct
\(p_{accept}(M) = \frac{\alpha}{\tilde{\pi}(M)}\)
So every samples has the same probability.
FPRAS for DNF counting
Assume there are \(n\) variables and \(m\) clauses.
Consider the most straight way, sample \(N\) samples \(X_{1}, X_{2}, \cdots, X_{n}\). \(X_{i}\) contains an assignment for each variable. If DNF satisfies, then \(X_{i} = 1\)
So \(\(\frac{1}{N}\sum_{i=1}^{N}X_{i} = \frac{ANS}{2^{n}}\)\)
According to Chernoff Bound, if we want to sample \(ANS\) with multiplicative error \(1 + \epsilon\) and confidence interval \(\delta\).
We need to hold
So \(N > \frac{2^{n}}{\epsilon^{2} ANS}\), but \(N\) may be exp about \(n\).
Karp, Luby and Madras algorithm [KLM]
The motivation is that choose a small and countefficient universe, then a estimator with good(polynomial) lower bound in order to use Chernoff Bound.
This is a simple way to decrease the size of universe.
We sample from the union of ground set instead of all the assignments.
Let the ground set(feasible solutions) of \(i\) th clause be \(S_{i}\).
We sample from \(\sum_{i=1}^{m} S_{i}\).
The estimator is \(\frac{\bigcup_{i=1}^{m}S_{i}}{\sum_{i=1}^{m} S_{i}}\).
Obviously,
This 'large' lower bound means that we can use Chernoff bound more effectively.
recall that \(\mathbb{Pr}[X  \mathbb{E}[x] > \alpha \mathbb{E}[x]] < e^{\frac{N\alpha^{2}\mathbb{E}[x]}{3}}\). To get FPRAS, we only need
which achieves FPRAS
In DNF counting problem, it is easy to sample \(\sum_{i=1}^{m}S_{i}\), we can use brute hash to exclude those same elements.
Network Unreliability
Definition: Given a graph \(G\) with a probability function states that a link \(e\) disappears independently with probability \(p_{e}\).
Here we assume that \(\forall e\in E\), \(p_{e} = p\). Now consider the probability that the surviving network is disconnected. Let \(Fail(p)\) be this problem.
Network unreliability problem can be trivially expressed as a DNF problem.
Let \(C_{1}, C_{2}, \cdots, C_{k}\) be \(k\) cuts of graph \(G\).
\(x_{e}\) is the indicator variable that \(e\) is broken.
Then \(G\) is disconnected iff
Is it solved by the KLM algorithm?
But it does not fit the standards of FPRAS because maybe \(k = \Theta(e^{n})\)
Karger's algorithm
Motivation: divide the problem into two parts, one is small, the other is large enough. Like the tricks in calculus.
Let \(C\) be the size of the minimum cut of \(G\).
Case 1: \(q > \frac{1}{poly(n)}\). We can use the Chernoff Bound directly because this property actually gives a lower bound.
$$$$
Case 2: \(q < \frac{1}{poly(n)}\).
Theorem Karger's theorem. For any graph \(G\) with \(n\) vertices and with minimum cut \(C\), and for any \(\alpha \geq 1\), the number of cuts of size at most \(\alpha C\) in is at most \(n^{2\alpha}\).
proof: Karger's contraction algorithm
Lemma: Let \(C_{1}, C_{2}, \cdots, C_{r}\) be all the cuts of \(G\) and let us sort them in the order of their size
For any \(\alpha \geq 1\), and \(q = n^{\beta}\) we have
proof:
recall that the theorem above actually gives the relation between the index and the cut size.
Let \(i = n^{2\alpha}\), \(\alpha = \frac{\log i}{2 \log n}\). So \(C_{i} \geq \frac{\log i}{2 \log n}C\)
By union bound
With this lemma, the "large cuts" has few probability to fail.
Markov Chains
Markov chains is a stochastic process on the state \(\Omega\).
Markov property
Markov kenel
Also Markov chain can be represent as a weighted direct graph, if \(K(x, y) = c > 0\) then there is a edge weighted \(c\) from \(x\) to \(y\).
If we sample \(X_{0} \sim p\).
Corollery
Stationary Distribution
Every Markov chain has a stationery distribution(may not unique).
First of all the Markov kernel must have eigenvalue \(1\) and corresponding eigenvector \(v\).
It is easy to show that \(\mathrm{det}(K  I) = 0\), as the sum of every column of \(K\) is exactly \(1\).
Now let \(\pi = (v_{1}, v_{2}, \cdots, v_{k})^{T}\). We show that \(\pi\) is a stationery distribution.
If \(\pi^{T}\pi \neq 1\), then we can normalize it into normal vector.
This is a trivial inequality. But the next one is somehow magic ...
But actually the last inequality must be tight, so the it must holds that \(\pi_{i} = \sum_{j}\pi_{j}K(j,i)\).
NB: in lecture notes it chose relu(v), but here I modified a little bit.
Definition (Reversible Markov Chains): A Markov chain is reversible iff there exists a nonnegative weight function \(w: \Omega \rightarrow \mathbb{R}_{+}\). Such that for every \(x, y\).
It follows that \(\frac{\pi}{Z}\) is the stationery distribution.
Mixing of Markov Chain
Definition(Irreducible Markov Chain): A Markov chain is irreducible iff for any states \(x\), \(y\), it exits \(t\) such that \(K^{t}(x,y) > 0\).
Definition(aperiodic): A Markov Chain is aperiodic if for all \(x, y\) we have \(\gcd\{tK^{t}(x,y)>0\} = 1\)
Lemma: Let \(K\) be an irreducible, aperiodic Markov Chain. Then there exists \(t>0\) such that for all \(x, y\)
Actually I do not know how to formally prove this...
Theorem (Fundamental Theorem of Markov Chains): Any irreducible and aperiodic Markov chain has a unique stationery distribution. Futhermore, for all \(x, y\),
as t goes infinity. In particular, for any \(\epsilon > 0\) there exits \(t > 0\) such that \(K^{t}(x, .)  \pi_{TV} < \epsilon\).
How to make an arbitrary Markov chain ergodic: All add a selfloop with \(\frac{1}{2}\)
Metropolis Rule
Given a finite state set and a weight function \(w: \Omega \rightarrow \mathbb{R}_{+}\). We would like to sample from the distribution \(\pi(x) = \frac{w(x)}{Z}\).
Metropolis rule is a general tool to construct such a ergodic Markov chain.
Neighborhood Structure
The first requirement is a undirected connected graph \(G = (\Omega, E)\), two state are connected iff they different by some local changes.
Proposal Distribution
At any vertex \(x\) we require a proposal distribution, \(p(x, .)\) satisfying the following properties:
 \(p(x,y) > 0\) only if \(y\) is a neighbor of \(x\).
 \(p(x,y) = p(y,x)\) for all \(y\)
 \(\sum_{y}p(x,y) = 1\)
Metropolis chain
 decide a propose move from \(x\) to \(y\) with probability \(p(x, y)\).
 accept the propose move with probability \(\min\{1, \frac{\pi(y)}{\pi(x)}\}\). else reject and stay at \(x\)
NB: this is like the simulated annealing ideas in optimization
Lemma: Metropolis chain is reversible with stationery distribution \(\pi\)
(But how to prove that \(\pi\) is the stationery distribution?)
Coupling
Definition(Coupling): Let \(\mu, \nu\) be probability distributions over \(\Omega\), A coupling between \(\mu, \nu\) is a probability distribution \(\pi\) on \(\Omega \times \Omega\) that preserves the marginals of \(\mu, \nu\).
Lemma(Coupling Lemma): Let \(X \sim \mu\), \(Y \sim \nu\)
Then
 \(\mathbb{Pr}[X \neq Y] \geq \mu  \nu_{TV}\)
 There exists a coupling \(\pi\) between \(\mu\) and \(\nu\) such that \(\mathbb{Pr}[X \neq Y] = \mu  \nu_{TV}\)
proof:
A simple observation is that \(\mathbb{Pr}[X=Y=a] \leq \min\{\mu(a), \nu(a)\}\)
So we just need to "properly" set remaining probability to let the inequality be tight.
\(\blacksquare\)
Mixing time
Terminology:
Definition (Mixing time): \(\tau(\frac{1}{2e})\)
Lemma:
proof:
This proof shows the power of coupling.
Let \(X_{0} = x\), \(Y_{0} \sim \pi\).
Suppose now we get a coupling such that \(\mathbb{Pr}[X_{t}\neq Y_{t}] = \Delta_{x}(t)\)
Now define \(X_{t+1}\), \(Y_{t+1}\) as below

If \(X_{t} = Y_{t}\), then set \(X_{t+1} = Y_{t+1}\)

else random walk independently
So \(\(\Delta_{x}(t+1) \leq \mathbb{Pr}[X_{t+1} \neq Y_{t+1}] \leq \mathbb{Pr}[X_{t}\neq Y_{t}] = \Delta_{x}(t)\)\)
\(\blacksquare\)
General bound for mixing time
Lemma:
proof:
Pick the same coupling
So
Let \(\((1  \Omega\min_{x, y} K(x, y)^2)^{t} \leq \frac{1}{2e}\)\)
\(\blacksquare\)
The above also proves the Fundamental Theorem of Markov Chain.
Theorem: For any Markov Chain, \(\tau(\epsilon) \leq O(\tau_{mix}\log\frac{1}{\epsilon})\)
Coloring
Try to assign \(q\) colors for a graph with maximum degree \(\Delta\).
We want to sample proper color assignments.
Define the coupling as follows \((X, Y)\).
Randomly choose a vertex and a color, try to change the color in both \(X\) and \(Y\) of corresponding vertex.
Also \(d(X_{0}, Y_{0}) \leq n\)
So
So to achieve \(\frac{1}{n}\) error, we need \(O(nq\log n)\) steps.
Also we need \(q \geq 4\Delta + 1\)
Path coupling
Path coupling is defined on the premetric. And it is the property that holds for adjacent vertices and can be extend to the whole \(\Omega\).
Definition (premetric): A premetric defined on \(\Omega\) holds that for any adjacent edge \(uv\), \(d(uv) = d(u, v)\).
Definition (path coupling): Define \((X', Y')\) is a coupling define on a premetric graph. If for any adjacent vertices \((x, y)\) holds that
Then for every pair vertices, they all hold the above inequality.
insight: The original coupling is about to converge. And the uniform weighted graph is naturally holds the premetric.
proof:
For any pair of vertices \((s, t)\).
Chose an arbitrary shortest path \((s=u_{0}, u_{1}, u_{2}, \cdots, u_{k}, t=u_{k+1})\).
Naturally extend the coupling into multivertices coupling \((U_{0}, U_{1}, U_{2}, \cdots, U_{k}, U_{k+1})\).
\(\blacksquare\)
Coloring with Path coupling theorem
Theorem: If \(q \geq 2\Delta + 1\), then the metropolis rule mixes in time \(O(n\log n)\).
First we need to modify the graph to let it be a premetric graph in order to use path coupling theorem.
Consider the metric \(d(X, Y)\) calculating how many different colors of color configuration \(X\) and \(Y\). Sometime we cannot find a path of proper exchange, so we need to allow improper color configuration. Such that there are total \(q^{n}\) vertices in the graph.
We can use metropolis rules to let their probability is \(0\) in \(\pi\). Also if we bound the total variance, we can still bound the "real" total variance.
Then we need to design the path coupling procedure. We can only consider \(d(X', Y'x, y)\), \(x\) and \(y\) are adjacent with path coupling.
Assume the exact different color of \(x\) and \(y\) are on the vertex \(u\).
In general, we choose a vertex \(v\) and a color \(c\) randomly, then try to change \(v\)'s color into \(c\).
This is not the complete procedure, but we can try to analyze this. We denote \(N(u) \bigcup \{u\} = N^{*}(u)\).
If \(v\) is not in \(N^{*}(u)\), then \(d(X', Y') = d(x, y)\).
If \(v = u\), then \(\mathbb{E}[d(X', Y')] \leq \frac{\Delta}{q}d(x, y) + (1  \frac{\Delta}{q})(d(x, y)  1)\).
If \(v \in N^{*}(u)\), but \(u \neq v\), if \(c = c_{x}\) or \(c = c_{y}\) then \(d(X, Y) \leq d(x, y) + 1\) otherwise \(d(X, Y) = d(x, y)\).
Sum all the inequalities above, we get
When \(q \geq 3\Delta + 1\).
To improve this bound, we need to specify when \(c = c_{x}\) or \(c = c_{y}\), we can reorder the configuration. To achieve
Which we only need \(q \geq 2\Delta + 1\).
Coloring with Heat bath chain
The main motivation of the path of metropolis rule is that try to construct a coupling such that the probability of \(d(x, y) + 1\) is low.
One trivial way is to increase \(q\). Beyond that path coupling consider just consider vertices pairs that are adjacent in order to decrease the "controversy".
Now let's focus on heat bath chain.
Instead of metropolis rule that randomly pick a vertex and a color then change the configuration. Heat bath chain randomly chooses a vertex then samples from the proper color.
(Obviously it may break the premetric requirement)
Here is a simple way: randomly chooses a vertex \(u\). Denote the proper colors for \(u\) under configuration \(X\) as \(A(X, u)\).
Then for configuration \(X, Y\), we sample from \(\max(A(X, u), A(Y, u))\). With probability \(\frac{A(X, u) \bigcap A(Y, u)}{\max(A(X, u), A(Y, u))}\) we choose the same color. Obviously we can properly assign other events.
Then we analyze this coupling
Sum all these equalities and inequalities
If \(q \geq 2\Delta + 1\),
So for heatbath, \(q \geq 2\Delta + 1\) and trivial coupling method lead to \(\Theta(n \log n)\) mixing time.
Trianglefree graph
Now image that the given graph is trianglefree. Then \(A(X, u)\) can be very large. Maybe better than \(q  \Delta\) which can leads to a better result.
Now analyze the \(A(X, u)\)
For trianglefree graph \(\mathbb{E}[\prod_{v \sim u} (1  X_{v, c})] = \prod_{v \sim u} (1  \mathbb{E}[X_{v, c}])\)
We denote the information about \(V  N^{*}(u)\) as \(\mathcal{F}\).
Recall the McDiarmid's inequality.
Let \(X_{v,c}\) s be the variables (NB: here are actually \(d(u)\) variables), \(f(.) = A(X, u)\).
\(f\) is 1Lipschitz function.
So
So
By union bound,
\(\blacksquare\)
Mixing time using eigenvalues
Here we consider reversible Markov Chains.
For any function \(f, g\), we consider innerproduct space as
It follows that \(K\) is selfadjoint.
With spectral theorem, \(K\) has \(n\) eigenvalues.
By stochasticity, \(\forall i \leq n, \lambda_{i} \leq 1\)
Some results about the characteristics of reversible Markov Chain.
irreducible <> \(\lambda_{2} < 1\)
aperiodic <> \(\lambda_{\Omega} > 1\)
Why eigenvalues are "\(l_{2}\) properties" about the chain?
Definition: (\(l_p\) distance)
Compare with \( \cdot _{TV}\)
\(\cdot_{TV} = \frac{1}{2} \cdot_1 \leq \frac{1}{2}\cdot_2\)
Is this a general rule?
Theorem: Let \(\lambda^{*} = \max\{\lambda_{2}, \lambda_n\}\),
Consider
Then project \(\frac{K^{t}(x, \cdot)}{\pi(\cdot)}  1\) into \(K\) space.
Notice: \(\boldsymbol{1} = \psi_1\)
Such a magic
select a vertex with high probability as warm start.
Path technology
Dirichlet Form
For two functions \(f, g\), define
(easy to see that \(I  K\) is selfadjoint)
\(\mathcal{E}(f, g) = \frac{1}{2}\sum_{x, y} (f(x)  f(y))(g(x)  g(y))\pi(x)K(x, y)\)
NB: actually we just need to enumerate all the edge. The reason is that if \((x, y) \notin E\), then \(K(x, y) = 0\)
Dirichlet form of \(f\)
\(\mathcal{E}(f, f) = \frac{1}{2}\sum_{x, y} (f(x)  f(y))^2 \pi(x) K(x, y)\)
Variance of \(f\)
Definition: Poincare constant
If \(K\) is reversible, then poincare constant is \(1  \lambda_2\)
For a lazy chain with Poincare constant \(\alpha\),
So if we can bound the poincare constant, we can bound the mixing time.
Consider the multicommodity flow problem(reversible Markov chain),
Define
Define the congestion of an edge \(e\) as \(\frac{f(e)}{Q(e)}\)
For any reversible Markov chain, for any two states \(x, y \in \Omega\),
Actually, \(P_{x, y}\) can be bounded by the diameter of the graph then the \(n\).
proof: For any function \(f\)
Fractional version
Define \(\mu_{x, y}(P)\) as the probability of choosing \(P\) as the path from \(x \rightarrow y\).
Then update the definition for \(f(e)\)
Now calculate the Poincare constant
So the same theorem holds.
\(\blacksquare\)
There are some intrinsic gaps between this bound and the tight bound by \(\log n\).
All or nothing theorem
Def of selfreducible problem:
For every instance of one specific NP search problem, if the set of its solutions can be divided into polynomial size subsets corresponding to the same NP search problem with smaller instances(these instances can be generated in polynomial time). Then this NP search problem is selfreducible problem.
Is this def the same as Sipser?
Thm: For any selfreducible problem, if there exists a polynomial time counting algorithm that gives \(1 + poly(n)\) multiplicative error, then there is an FPRAS.
(Recall that for selfreducible problem, \(\text{FPRAS} \leftrightarrow \text{FPAUS}\))
The basic intuition is that we prove that there is a FPAUS.
If we can use the polynomial approximation factor to build a Markov chain with fast mixing time.
The good situation is that every vertex(state) in Markov chain represents a proper answer(element needed to be counted).
Of the proportion of proper vertices is large enough, like \(\frac{1}{poly(n)}\).
So we can derive the FPRAS.
Here We will use the DNF(different from the counting problem in the notes) as a demonstration.
Consider a tree from the root to the leaves, one step downward means that a variable is assigned.
Define a order of variables \(l_1, l_2, \cdots, l_n\), At level \(i\), assign the \(v_i\) true of false.
Basically this is a decision tree, every node in the tree is a configuration.
Denote the actual count of configuration \(x\) as \(N(x)\), the FPRAS with parameter \(\alpha\) returns a answer \(\hat{N}(x)\), \(\frac{N(x)}{\alpha} \leq \hat{N}(x) \leq \alpha N(x)\).
For simplicity, we can assume \(\hat{N}(x)\) is deterministic. (Otherwise we can run \(\log \frac{1}{\epsilon}\) to boost).
For an edge \((e, v)\), w.l.o.g. \(e\) is the father of \(v\).
Define \(w(e, v) = \hat{N}(v)\). To make it a Markov chain, for a specific vertex \(x\), whose father is \(a\), sons are \(b\) and \(c\). Define \(Z_{x} = \hat{N}(x) + \hat{N}(b) + \hat{N}(c)\), \(K(x, a) = \frac{\hat{N}(x)}{Z_{x}}\), \(K(x, b) = \frac{\hat{N}(b)}{Z_{x}}\) and \(K(x, c) = \frac{\hat{N}(c)}{Z_{x}}\).
Now consider random walk on this graph. Define \(Z = \sum_{x}Z_{x}\). It follows that \(\pi(x) = \frac{Z_{x}}{Z}\).
Assume \(x\) is the father of \(y\) in the following equation.
Each assignment is actually a leaf. Because leaves have the same level, so they has the same probability(has only one edge connecting to its father).
Then we need to bound the sum of leaves' probability.
Lemma: random walk ends at leaves with probability \(\frac{O(1)}{\alpha n}\)
proof:
Use \(L\) to denote the actual amount of leaves.
Observe that the sum of the weight of edges from level \(i\) to level \(i+1\) is at most \(\alpha L\).
Because there are at \(n\) levels, so sum of all the weight is at most \(2\alpha Ln\)
The probability that ends at leaves is at least \(\frac{1}{2\alpha n}\)
\(\blacksquare\)
That's \(\frac{1}{poly(n)}\)
Now we need to bound the mixing time.
Although the structure of tree means that there is only simple path, it seems not easy to use the path coupling lemma.
We choose path technology here.
For a specific edge \(e = (u, v)\), \(u\) is the father of \(v\).
\(f(e) = (\sum_{x \in son(v)} \pi(x)) (1  \sum_{x \in son(v)} \pi(x)) \leq \frac{1}{Z}2\alpha n N(v)\)
\(Q(e) = \pi(u)K(u,v) = \frac{Z_u}{Z} \frac{\hat{N}(v)}{Z_{u}} = \frac{\hat{N}(v)}{Z}\)
\(\max_{e} \frac{f(e)}{Q(e)} \leq 2\alpha^2 n\)
\(\max_{x, y} P_{x, y} \leq 2n\)
By the path technology
Choose x as root. Then \(\pi(root) = \frac{Z_{root}}{Z} \geq \frac{1}{2\alpha n}\)
So,
So we can get a FPAUS.
Implementation
Random walk on hypercube.
First consider using the path coupling.
Obviously the metric satisfies the premetric constrains.
consider an edge \(e = (x, y)\).
coupling with randomly picking a bit then turn to the same neighbor.
So \(d(X, Y) \leq n(1  \frac{1}{n})^{T}\), \(T = O(n\log n)\)