Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of size k from a set S that contains n items, where n is either very large or unknown. The answer for question 2-43 of the second edition of The Algorithm Design Manual by Steve S. Skiena is a reservoir sampling algorithm. The problem statement is as follows:
You are given a set S of n numbers. You must pick a subset S’ of k numbers from S such that the probability of each element of S occurring in S’ is equal (i.e., each is selected with probability k/n). You may make only one pass over the numbers. What if n is unknown?
This is my implementation (using Python 3) of an algorithm that would solve this problem in Θ(n) time.
def random_sample(iterator, k): sample =  elements_so_far = 0 for item in iterator: elements_so_far += 1 if len(sample) < k: sample.append(item) else: index = int(random.random() * elements_so_far) if index < k: sample[index] = item return sample