Data Structures
A quick reference sheet for different data structures and possible python/go/c implementation of said data structures.
Heaps
A heap is a complete binary tree where all children of any node have a greater value than the node itself.
A complete binary tree is a type of binary tree where all levels are fully filled except possibly the last level, which is filled from left to right.
An example of a heap:
0
/ \
1 2
/ \ /
3 4 6
Heaps in Python
from heapq import heapify, heappush, heappop, heapreplace, heappushpop
# convert array to heap O(n)
x = [1,2,3,4]
heapify(x)
# pushing n elements would take O(n lg n)
The heapify
function turns an existing array into a heap structure. It does this in-place, and does not return the heap.
We do not have to call heapify
on an empty array as it would no do anything.
# push elements onto the heap O(lg n)
heappush(x, 2)
heappush(x, 15)
heappush(x, 20)
# pop elements from the heap O(lg n)
min_x = heappop(x)
min_x = heappop(x)
min_x = heappop(x)
# push then pop an element (more efficient than separate)
min_x = heappushpop(x, 22)
# pop then push element
min_x = heapreplace(x, 22)
Heap Operations Implementations
Heapify
We use the Floyd’s heap construction algorithm or the bottom-up heap construction to convert an array into a heap. This build a heap starting from the first non-left node of the tree. For each such node, it ensure that the min/max (depending on heap) is on top.
In a binary tree with an array representation, the left child is at \(2i + 1\) and the right child is at \(2i+ 2\). This means that if \(2i + 1 >= n\) then node \(i\) has no children – it’s a leaf.
# for a min heap
def heapify(arr: List):
n = len(arr)
last_non_leaf_node = (n // 2) - 1
for i in range(last_non_left_node, -1, -1):
sift_down(arr, i, n)
def sift_down(arr: List, i: int, n: int):
smallest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] < arr[smallest]:
smallest = left
if right < n and arr[right] < arr[smallest]:
smallest = right
if smallest != i:
# swap smallest and "root"
arr[i], arr[smallest] = arr[smallest], arr[i]
sift_down(arr, smallest, n)
This operation runs in \(O(n)\) time. Each node is «shifted down» to maintain the heap property starting at the bottom. Nodes near the bottom have less distance to fall so the cost to heapify is low. The nodes near the top have more distance to fall but there are fewer of them.
\[ \sum_{i=0}^{\log n} \left(\frac{n}{2^{i+1}} \cdot i\right) = O(n) \]