Sorting

On this page you will find several notes and code snippets about popular sorting algorithms.

Selection sort

Algorithm Description

Sort n numbers stored in array A by finding the smallest element of A and exchanging it with the element in A[1]. Then find the second smallest element of A and exchange it with A[2]. Continue in this manner for the first n - 1 elements of A.

Implementation

for i = 1 to A.length - 1
  // sI stands for smallestIndex
  sI = i
  for j = i + 1 to A.length
    if A[j] < A[sI]
      sI = j
  // swap (if necessary)
  if (sI != i)
    tmp = A[i]
    A[i] = A[sI]
    A[sI] = tmp

Analysis

The best-case for this algorithm is an already sorted data array. The worst-case would be produced by a sorted array with the first element moved to the last position and all the other elements moved one place to the left. Note that the only thing that varies is the amount of swaps made, the algorithm has Θ(n2) time-complexity even for sorted input.

Heapsort

Heapsort is great: it is simple to program and has worst-case O(n log n) (the best that can be expected from a comparison-based sorting algorithm). It is also an example of how a data structure can make dramatic algorithmic improvements. Heapsort can be thought as a variation of selection sort that uses a priority queue to speed up the process.

Analysis

Removing an item from an unsorted array – given its index – takes constant time (because you can write the last element over the element you wish to remove and reduce the length by one in constant time), whilst finding the minimum of an unsorted array takes O(n) time (through linear search). This makes selection sort a O(n2) average-case algorithm, as it finds the minimum n times.

Some data structures provide these operations with better time complexities. Heaps and balanced binary trees (these are not abstract data structures, such as the aforementioned priority queue) provide both operations with O(lg n) time complexity, making the sorting algorithm (that operates on n elements) take O(n lg n) time.

Heaps

Heaps work by maintaining a partial order of the elements, which is weaker than the sorted order (so it is easier to maintain) yet stronger than random order (so the minimum element can be quickly identified). A min heap establishes that the parent node has a smaller value than its children, while the parent nodes of a max heap dominate by being bigger.

Heaps are slick data structures that enable us to represent binary trees without using pointers. Specifically, a heap is usually represented as an array where the positions of the keys implicitly take the roles of pointers. If we are using a zero-indexed array, the n-th level of the tree starts in the 2n-1 - 1 element and ends at the 2n - 2 element. The root is stored in the first position of the array and its left and right children in the second and third.

Size

Any binary tree can be stored in an array without the use of pointers. The downside is that sparse trees end up occupying huge amounts of space as “missing” nodes still take up space. Space efficiency demands that each level is as dense as it can be. Therefore, only the last level may be incomplete. It is possible to store a tree of n keys using an array of exactly n elements (as long as all the elements of the last level are packed to the left).

Representing binary trees using this structure saves memory, but does not allow some tricks such as moving a subtree around by manipulating a single pointer. This illustrates why arrays are not a good representation for binary search trees, but a great representation for heaps.

Construction

Incremental insertion is the simplest way to construct heaps. Inserting the new element in the leftmost empty position of the array ensures that the shape of the tree is correct, but may violate the right ordering of keys. To solve this problem, some children nodes may need to be swapped with their parents in a process known as bubbling up.

Getting the minimum or maximum

Identifying the dominant element is as simple as getting the first element of the underlying array. To fill the hole left by the removal of the first element the rightmost leaf can be used. The new root may eventually be dominated by its children. In order to fix this, the new root may be swapped with its children until it dominates all its children, perhaps by becoming a leaf (in a process know as bubbling down). This operation is sometimes called heapify, as it merges two heaps – the subtrees below the old root – with a new key.

Faster construction

There is a faster way to construct heaps than incremental insertion. The core idea is that, after getting n keys to construct a heap from, we just insert them at the beginning of the array. This makes the right shape for our array (it is not sparse), but the dominance order may be completely wrong. In order to fix the dominance order, all we need to do is to call the bubble down procedure to all elements that are not leaves (trivially, all the leaves are dominant). This is faster because it can be proven with analysis that all bubble down calls are bounded by a constant. As the number of bubble down calls is O(n), this solution is O(n).

Implementations

This is my implementation of a priority queue and heapsort using Python 3.