Data Structures

A quick reference sheet for different data structures and possible python/go/c implementation of said data structures.

Heaps

A heap is a complete binary tree where all children of any node have a greater value than the node itself.

A complete binary tree is a type of binary tree where all levels are fully filled except possibly the last level, which is filled from left to right.

An example of a heap:

    0
   / \
  1   2
 / \  /
3  4  6

Heaps in Python

from heapq import heapify, heappush, heappop, heapreplace, heappushpop

# convert array to heap O(n)
x = [1,2,3,4]
heapify(x)
# pushing n elements would take O(n lg n)

The heapify function turns an existing array into a heap structure. It does this in-place, and does not return the heap. We do not have to call heapify on an empty array as it would no do anything.

# push elements onto the heap O(lg n)
heappush(x, 2)
heappush(x, 15)
heappush(x, 20)

# pop elements from the heap O(lg n)
min_x = heappop(x)
min_x = heappop(x)
min_x = heappop(x)

# push then pop an element (more efficient than separate)
min_x = heappushpop(x, 22)

# pop then push element
min_x = heapreplace(x, 22)

Heap Operations Implementations

Heapify

We use the Floyd’s heap construction algorithm or the bottom-up heap construction to convert an array into a heap. This build a heap starting from the first non-left node of the tree. For each such node, it ensure that the min/max (depending on heap) is on top.

In a binary tree with an array representation, the left child is at \(2i + 1\) and the right child is at \(2i+ 2\). This means that if \(2i + 1 >= n\) then node \(i\) has no children – it’s a leaf.

# for a min heap
def heapify(arr: List):
    n = len(arr)
    last_non_leaf_node = (n // 2) - 1
    for i in range(last_non_left_node, -1, -1):
        sift_down(arr, i, n)

def sift_down(arr: List, i: int, n: int):
    smallest = i
    left = 2 * i + 1
    right = 2 * i + 2

    if left < n and arr[left] < arr[smallest]:
        smallest = left

    if right < n and arr[right] < arr[smallest]:
        smallest = right

    if smallest != i:
        # swap smallest and "root"
        arr[i], arr[smallest] = arr[smallest], arr[i]
        sift_down(arr, smallest, n)

This operation runs in \(O(n)\) time. Each node is «shifted down» to maintain the heap property starting at the bottom. Nodes near the bottom have less distance to fall so the cost to heapify is low. The nodes near the top have more distance to fall but there are fewer of them.

\[ \sum_{i=0}^{\log n} \left(\frac{n}{2^{i+1}} \cdot i\right) = O(n) \]