Weighted Pseudo-Random Number Generation

In this article, I will introduce a simple algorithm for generation of non-uniformly distributed pseudo-random numbers built on top of a uniform pseudo-random number generator (PRNG)

Introduction

An electronic computer is incapable of generating a truly random number such that it’s outcome is unpredictable. This is in contrast with, for example - tossing a fair coin whose value can’t be called until after we’ve tossed it and seen it’s value. This is why the numbers generated are said to be pseudo random.

A PRNG is just an algorithm that outputs a number. A good PRNG outputs numbers that are uniformly distributed, i.e. for a 5,000 generated integers between 1 and 5, we’d want to generate each integer approximately a 1,000 times.

Non-Unifrom Distribution

We wish to generate random numbers that are biased towards certain values. For example, in the generation of integers between 1 and 5, we may want 5 to have 5 times as much chance as 1 to be generated, 4 to have 4 times as much chance as 1 to be generated, and so on.

Formally, we can say that the probabilities of generation of various numbers are as follows:

Number Probability
1 1/15 = 6.67%
2 2/15 = 13.33%
3 3/15 = 20%
4 4/15 = 26.67%
5 5/15 = 33.33%

One strategy to achieve this would be to first generate a number. Then we would choose to accept the generated number only as per its probability density. For instance, if the number generated is 3, then we would accept it only 20% of the time, and discard it and generate another number otherwise. This strategy is formalized and explained in detail in the following algorithm.

Algorithm

Parameters: fn - a function that returns the probability of accepting the number that it is passed as an argument

1. Let r = generate a new number using an uniform PRNG between desired bounds
2. Let test = generate another new number using an unifrom PRNG in range [0, 1)
3. If test < fn(r), then goto 4 else goto 1
4. Return r  

The crucial step in this algorithm is step 3. Greater the fn(r) (probability of generation of a number), more are the chances that the if condition is satisfied. Over a large number of condition tests, the if condition will be satisfied approximately with fn(r) probability.

A simple way of understanding why this works would be to imagine taking a big list of uniformly generated numbers. So, for our example we’ll take a 5,000 member list with 1,000 members each of the elements 1, 2, 3, 4, and 5. Now we’ll iterate over the list and for each element we’ll evaluate if we should delete it or keep it. We’re biased against 1 as we keep it only 6.67% of the time, in contrast to 5 which we keep 33.33% of the time. In the end, we’ll have about 66 members of digit 1 remaining, 133 of digit 2, 200 of digit 3, 266 of digit 4, and 333 of digit 5. So in essence we have generated 1,2,3,4, and 5 their respective percentage of the time by biasedly discarding the numbers we didn’t want as much.

Another interesting thing to notice is that the probabilites assigned to our generated numbers can be scaled without affecting the outcome. For example, we could instead let the numbers pass the test with the following probabilities

Number Probability
1 3 * 1/15 = 20%
2 3 * 2/15 = 40%
3 3 * 3/15 = 60%
4 3 * 4/15 = 80%
5 3 * 5/15 = 100%

And it will work, because if we go through the explanation again with the new probabilities, we’ll still have 5 times as many digit 5 elements in the list as the digit 1 elements.

So it is important to understand that the probability of rejection/passing in the algorithm is not the same as the probability density of the function and needn’t add up to a 100%!

Common Lisp and a JavaScript implementations of this algorithm can be as follows:

(defun weighted-pseudorandom (fn)
  (do ((r 0 (random 1.0))
       (test 1 (random 1.0)))
      ((< test (apply fn (list r))) r)))
/**
 * Weighted PRNG that generates a value in [0, 1)
 * @param {function} fn The function that returns the probability of letting the generated value pass
 */
const weightedPRNG = function (fn) {
  let test = 1
  let r = 0
  while(!(test < r)) {
    r = Math.random()
    test = Math.random()
  }
 return r
}

One limitation of this algorithm is that its worst case performance would technically take forever, should the two random numbers being generated stay the same pair which do not pass the test. This is not very probable, is close to impossible, but isn’t impossible.

This algorithm wouldn’t be too performant on certain kinds of distributions either, where a large number of elements have very low probabilities and a few have very high. This is because the elements are first uniformly generated, and a large number of low probability outcomes would require that we loop more.

Practically though, it works quite well for many distributions, especially continous ones.

Continuous Distributions

Neat thing about this algorithm is that it works for continuous functions as good as it does for discreete elements. As an example, consider a linear probability distribution where probability density of x is directly propotional to x.

We could draw a linear graph that would look something like the following ASCII art

 f(x)
  |     /
  |    / 
  |   /
  |  / 
  | /
  |/___________ x

A probability test function f(x) = x would suffice to give the desired linear probability density if we were to generate a number between 0 and 1. For example, 0.5 will have twice as much probability of being generated as 0.25.

For a distribution that is propotional to x^2 and resembles the graph of the same, we could use f(x) = x^2 as the probability test function for generating a number between 0 and 1. In this, 0.5 will be 4x more likey generated than 0.25 since (0.5 * 0.5)/(0.25 * 0.25) = 4

Closing Remarks

You can run a quick and dirty verification of the algorithm here.

This method is formally known as Rejection Sampling.

I may update this article in future with more methods for weighted pseudo random number generation. Thanks for reading!