Skip to article frontmatterSkip to article content

Negative Hypergeometric

Problem Setup

Let’s consider the following case: there are NN balls among which there are GG red ones and NGN - G blue ones. We introduce TT as the number of balls we need to take without replacement in order to get rr red ones. What is the probability that the number of balls TT is nn?

Our goal is to find P(T=n)P(T = n).

PMF

Number of ways to get rr red balls among nn draws

All possible outcomes

The number of possible outcomes is the number of ways to put GG red balls among all the NN balls:

(NG)\binom{N}{G}

Final probability

Combining everything we get:

P(T=n)=(n1r1)(NnGr)(NG)P(T = n) = \frac{\binom{n-1}{r-1}\binom{N-n}{G-r}}{\binom{N}{G}}

Expectation

E[T]=r(N+1)G+1\mathbb{E}[T] = \frac{r (N+1)}{G+1}

Intuition

Imagine you line up all NN draw positions in a row:

1,2,3,,N1,2,3,\dots,N

Now you drop the GG successes randomly into this line. Think of it like splitting the line into G+1G+1 chunks:

On average, each chunk has length about (N+1)/(G+1)(N+1)/(G+1). So the rr-th success will typically sit at the end of the rr-th chunk.

That’s all the formula says: “the rr-th success is expected at about rr times the average spacing between successes.”

Variance

Var(T)=r(Gr+1)(N+1)(NG)(G+1)2(G+2)\mathrm{Var}(T) = \frac{r(G-r+1)(N+1)(N-G)}{(G+1)^2(G+2)}

Just the formula of variance — no intuition here.

Footnotes
  1. We have to count all the ways of placing all the balls. To make an analogy, imagine the case when we shuffle a deck of cards and take nn first cards. Even though we don’t care about the rest of the deck, we have to count the number of ways that the cards in the deck have positions in order to get the true probability.