A more efficient method to determine the length of random blogrolls is to wait until a reload features no entries that haven’t been seen before. After such a point, if the number of unique entries is *n*‘, then to check whether *n*‘ = *n*, reload again. If *n*‘ = *n* – 1, then the probability that a given reload will include the remaining entry is *k*/*n*. So each reload has a probability of (*n* – *k*)/*n* of having no new entry. After sufficiently many reloads, the probability will be 0.05 and the result will be significant at the 95% level.

Now, let’s see what “sufficiently many” means. After *r* reloads, the probability is ((*n* – *k*)/*n*)^*r*, which we want to be less than 0.05. But that implies that *r* is greater than the log of 0.05 to base (*n* – *k*)/*n*, i.e. *r* > log 0.05 / log (*n* – *k*)/*n* = 1.3/(log *n* – log(*n* – *k*)), where the log is taken to base 10.

This method is much, much more efficient than the other one. With *n* = 80 and *k* = 20, we need *r* > 10.4, i.e. *r* = 11. The absolute worst case scenario is when the second reload is identical to the first, and then each time we need 11 reloads to get a new entry. Then *m* = 1 + 11(*n* – *k*) = 661. And that’s an extremely unlikely upper bound.

When *n*‘ < *n* – 1, the probability that at least one entry will be new is way higher than *k*/*n*. For instance, when *n*‘ = *n* – 2, it’s the complement of (*n* – *k*)/*n* * (*n* – *k* – 1)/(*n* – 1) = 0.56, and *r* > log 0.05 / log 0.56 = 5.16, so 6 reloads are enough. The actual formula for the probability for any value of *n* – *n*‘ is then the obvious generalization of the formula for *n* – *n*‘ = 2.

But note that since we don’t actually know *n*, only *n*‘, we can only use *r* = 11. But the expected values for *n* – *n*‘ > 1 can be used to calculate the expected number of reloads it will take to cover all entries. An upper bound is the sum of expected values for each *n* – *n*‘, since it in effect assumes there’s only one new entry each time.

If we have an event with probability *p*, the expected number of trials it takes to get it is 1/*p*. The expected value for a given *s* = *n* – *n*‘ is then 1/(1 – (*n* – *k*)(*n* – *k* – 1)…(*n* – *k* – *s* + 1)/*n*(*n* – 1)…(*n* – *s* + 1)) = *n*(*n* – 1)…(*n* – *s* + 1)/(*n*(*n* – 1)…(*n* – *s* + 1) – (*n* – *k*)(*n* – *k* – 1)…(*n* – *k* – *s* + 1)). We need to sum that over all *s* from 1 to *n* – *k* and add *r*.

With *n* = 80 and *k* = 20, we get 4 + 2.27 + 1.71 + 1.45 + … + 1 + 1 = 66.39, i.e. 78 reloads are enough.

Note that we don’t actually use *n* when reloading to calculate probabilities. Instead, we can use an estimate derived from the method of matches of successive reloads. For each one point increase in *n*, the number of reloads goes up by a little more than one (for example, *n* = 81 yields 67.53).