Another Tail of the Hypergeometric Distribution

In many cases of analysis of algorithms, we encounter the hypergeometric distribution, which is basically sampling without replacement. There are unfortunately a lot of different notations used, as is nicely summed up in Matthew Skala’s survey. I will not be making the situation better, as for this post I’ll use another notation still:

Chvátal MathWorld Wikipedia This post
Balls in urn N n+m N n
Balls that count M n K pn
Balls that don’t count N-M m N-K (1-p)n
Balls you draw n N n sn

We usually approximate the sampling as sampling with replacement, thereby getting the binomial distribution. Thus it comes as no surprise, that we can bound the tail of the hypergeometric just like the tail of the binomial. This has a sleek proof by Chvátal or simply using anti correlation and Hoefding.

\begin{aligned} \Pr[X\ge(p+t)s n] &\le \exp(-s n \text{D}(p+t||p))\\ &\le \exp(-2t^2sn) \end{aligned}

Here \text{D}(x||y)=x\log\frac{x}{y}+(1-x)\log\frac{1-x}{1-y} is the Kullback-Leibler divergence, and we’ve used that \text{D}(x||y)\ge2(x-y)^2.

However, while the binomial is only a good approximation to the hypergeometric, when the fraction of sampled elements $latex s$ is small. I’ve often wondered how much better we could actually do, when the number of sampled balls is close to the total number. The answer is as follows:

\begin{aligned} \Pr[X\le(p-t)s n] &\le \exp(-sn\text{D}(p-t||p)) &\le \exp(-2t^2sn)\\ \Pr[X\le(p-t)s n] &\le \exp(-(1-s)n\text{D}(p+\tfrac{ts}{1-s}||p)) &\le \exp(-\tfrac{s}{1-s}2t^2sn)\\ \Pr[X\ge(p+t)s n] &\le \exp(-sn\text{D}(p+t||p)) &\le \exp(-2t^2sn)\\ \Pr[X\ge(p+t)s n] &\le \exp(-(1-s)n\text{D}(p-\tfrac{ts}{1-s}||p)) &\le \exp(-\tfrac{s}{1-s}2t^2sn)\\ \end{aligned}

While the two ‘elegant’ versions of the bounds, in the third column, are somewhat weaker than the entropy bounds, they make the difference stand out clearly: For the hypergeometric distribution it is possible to add a factor \frac{s}{1-s} in the exponent. As s goes to 1, this quantity goes to \infty. This makes sense, since for s=1 the hypergeometric is deterministic. We also note that the new weak bounds are stronger as soon as s>1/2, however for the entropy bounds the values of $p$ and $t$ also plays in.

We can show the above bounds by making use of symmetry to swap sn for (1-s)n before applying Chvátal’s bound to this other tail. The symmetries are a bit wild in my notation, so I’ll use Chvátal’s original.

Let X be a random hypergeometric drawing n balls from an urn of N balls of which M balls count. Then we define the cumulative distribution functions H_{-}(M,N,n,k) = \Pr[X\le k] and H_{+}(M,N,n,k) = \Pr[X\ge k]. By swapping the roles of balls drawn and not drawn, as well as those counting for those not counting, we get the following:

\begin{aligned} H_{-}(M,N,n,k) &= H_{+}(M,N,N-n,M-k)\\ H_{-}(M,N,n,k) &= H_{+}(N-M,N,n,n-k)\\ H_{+}(M,N,n,k) &= H_{+}(N-M,N,N-n,N-M-n+k) \end{aligned}

The arguments are mostly similar, so we’ll just do bound number two:

\begin{aligned} \Pr[X\le(p-t)s n] &= H_{-}(pn,n,sn,(p-t)sn)\\ &= H_{+}(pn,n,n-sn,pn-(p-t)sn)\\ &= H_{+}(pn,n,(1-s)n,(p+\tfrac{ts}{1-s})(1-s)n)\\ &\le \exp(-(1-s)n\text{D}(p+\tfrac{ts}{1-s}||p)) \end{aligned}

Another way we might have gone around finding the bounds, would be studying Stirling approximation of the main term \log\left({p n \choose k}{(1-p) n \choose sn-k}/{n \choose sn}\right) = (\text{H}(\tfrac{k}{pn})+\text{H}(\tfrac{sn-k}{(1-p)n})-\text{H}(s))n + O(\log n). As it turns out, taking the series at s=0 or s=1 gives exactly the bounds from the beginning of this post. While not quite a lower/reverse bound, this does provide some idea of the sharpness of the bounds.