Part 6 - Trees

Posted on Nov 30, 2022
(Last updated: May 26, 2024)

In this part we’ll cover so called ’trees’ in computer science.

Trees

The formal definition of a ’tree’ is: “A hierarchical data structure built up from nodes”.

So unlike a real tree which has its root at the bottom - the computer scientist flipped it - the root node is at the top of the tree.

In any tree there’s always one root node. Each node can have so-called child nodes, and these children can have their own child nodes and so on.

Therefore, all nodes - expect the root node - must have exactly one parent node.

Tree rules

I’ve already stated some of the ‘rules’ of a tree, but generally we have 3 rules that makes a collection of nodes a tree.

  • Each node must have exactly one unique parent node - expect the root node.
  • There must exactly be one root node.
  • If the first and second rules are followed - then no ‘cycles’ will appear.

Recursive Tree

Trees can also be defined recursively: “A tree consists of a root node together with its subtrees, which do not share any common nodes”.

This means, child node (of the root node) $\rightarrow$ root of subtree. This also means that a tree can be empty (no nodes) - for some applications this is needed.

Programming with trees

Now that we have some basic knowledge about trees - let’s try to represent them with code. A very natural representation would be a directory tree.

class Tree:
    name: string
    size: int

    children: list[Tree]

If we now want to compute the total size of a directory (folder) we would need to:

def total(tree: Tree) -> int:
    n: int = tree.bytes
    for c in tree.children:
        n += total(c)
    return n

And we can see that recursion and trees go hand in hand. Now let’s see how we can traverse the tree.

There are main methods for traversing - Pre-order traversal - meaning you process each node before visiting its children. As well as Post-order traversal - meaning you process each node after visiting its children.

Binary Trees

Often we can use a special kind of tree called ‘Binary tree’ - a binary tree is a tree where each node has exactly two children.

As we said before, these children can be represented as a null-value if they’re ‘missing’. We call these children nodes the ’left’ and ‘right’ nodes or subtrees.

Some terminology common used for trees are, size height, and level. Size is the number of nodes in the (sub)tree - height, is the number of levels in the tree. Level is the distance from node to the root.

Binary Search Trees

Binary search trees or BSTs is a kind of binary tree but with some extra conditions:

  • Each node has a key
  • The node’s key is smaller than all the keys in its right subtree
  • The node’s key is greater than all the keys in its left subtree

Or differently phrased, everything in its left subtree is less than the root, everything to the right is greater than the root.

With these rules searching becomes really, really effective and natural. We just compare our key to each node’s key and see if we traverse to the right or left.

If we ever hit a ’null’ node, we can also say that the key we’re searching for doesn’t exist in this key.

So let us implement a map using BSTs!

class BSTMap[Key, Value]:
    class Node:
        key: Key
        value: Value

        left, right: Node

    root: Node

def get(key: Key) -> Value:
    return get(key, root)

def put(key : Key, val : Value):
    root = put(key, val, root)

def get(key, node) -> Value:
    if node is None
        return None

    if key < node.key
        return get(key, node.left)

    if key > node.key
        return get(key, node.right)

    else:
        return node.value

def put(key, val, node):
    if node is None
        return Node(key = key, value = val, left = None, right = None)

    elif key < node.key:
        node.left = put(key, value, node.left)

    elif key > node.key:
        node.right = put(key, value, node.right)

    else:
        node.value = value

    return node

Fin max/min value in BST

To find the max/min value in BST we just repeatedly go right/left until we hit a null node. Real simple!

Or in code:


def get_max(tree: BST):
    c_node = tree.root

    while c_node != None:
        c_node = c_node.right

    return c_node.value

def get_min(tree: BST):
    c_node = tree.root

    while c_node != None:
        c_node = c_node.left

    return c_node.value

Sorting a BST

Sometimes we might want to print/store a BST in ascending order - the way to do this of course is via recursion!

This is also known as in order traversal - we print the left subtree, print the root then print the right subtree - we do this recursively.

So for example:

def print_sorted(node: Node):
    if node is None:
        return

    print_sorted(node.left)
    print(node.key)
    print_sorted(node.right)

We can also do a more general case - we make an iterator for the keys in the tree - by doing an in order traversal and store the keys in a dynamic array.

def keys() -> list[Key]
    list = []
    inorder(root, list)
    return list

def inorder(node: Node, list: list[Key]):
    if node is None:
        return

    inorder(node.left, list)
    list.add(node.key)
    inorder(node.right, list)

...

for key in BST.keys():
    print(key)

Deletion

Deletion can be quite tricky in trees, and especially in BSTs - we can’t just remove a node however, there are some rules.

Deleting a leaf

A so-called leaf is a node with no children (or visually the leafs are the bottom nodes). These are fine to remove without any corrections.

Since we just remove that child, with no children there won’t be any problems.

Deleting a node with one child

Now, in the case we want to delete a node with exactly one child, it’s still quite trivial. Since that node only has one child, we can just replace the deleted node with its child. It doesn’t affect the tree as a whole.

Deleting a node with two children

This is where it becomes tricky, but if we take a step back one think about it, we need to find a node that upholds the requirements of, everything to the left should be smaller and everything to the right must be greater. If we think what candidates fulfill this requirement - it becomes clear two nodes are good candidates.

The greatest (rightmost) key in the node’s left subtree or the smallest (leftmost) in the nodes’ right subtree.

The hardest case

The case above becomes quite simple if the candidates don’t have any children, the hardest case in deletion is when these have one child. What we need to do in that case is: We first delete the parent node by replacing it with its child - after we have done this we can delete the node we wanted to delete with our old (deleted) parent node.

So, not that hard logically - but can be a pain to implement correctly.

So let’s now do exactly that :)!

def delete(node: Node, key : Key):
	if node is None:
		return node

	if key < node.key:
		node.left = delete(node.left, key)

	elif(key > node.key):
		node.right = delete(node.right, key)

	else:
		if node.left is None:
			temp = node.right
			node = None
			return temp

		elif node.right is None:
			temp = node.left
			node = None
			return temp

		temp = get_max(node.left)

		node.key = temp.key

		node.right = delete(node.right, temp.key)

	return node

Complexity

The complexity of operations for BST are in the best case and average case, $\mathcal{O}(log\ n)$ which is great. In the worst case however, these operations will take $\mathcal{O}(n)$ since the items are in ascending order.

BST Summary

We can use BSTs to implement a set or map, however, with ordering-based operations such as min, max, range etc. with good complexity.

Insert/Delete/Search will take proportional time to the height of the tree. Which in the best/average case is $\mathcal{O}(log\ n)$, but if the tree is unbalanced it will become linear.

So the main takeaway is that, if a BST is balanced, it’s really powerful and efficient. So let’s take a look how we can make it balanced.

Invariants

The property that, in the left subtree, the keys should be lesser than and that in the right subtree, the keys should be greater than is an example of data structure invariant.

An invariant is a property that the designer wants to always hold. Usually a rule about what instances of the data structure are ‘valid’.

But these invariants can be expensive to maintain - if we say ’this array is sorted’ we can’t just use append anymore, we need to insert it. So we need to find an invariant for BSTs which is easier to maintain.

AVL Trees

AVL Trees has a less restrictive invariant compared to BSTs - in an AVL tree we keep the tree almost balanced.

The invariant for an AVL tree is: “For each node, the heights of its left and right subtrees differ by at most 1”.

Tree Rotations

Now before we dive deeper into AVL trees - let’s take a look into rotations within trees. When we talk about rotations in trees we mean that it has a different node as the root.

Right rotation: left child of root becomes root, old root becomes right child. Left rotation: right child of root becomes root, old root becomes left child.

Tree rotations preserve ordering and contents in BSTs. Note that we can also rotate a subtree.

AVL Insertion

To insert a new node into an AVL tree - we first start by doing a BST insertion - this might break the AVL tree (the invariant).

But to fix this we go upwards from the new node, finding and fixing nodes that break the invariant. Note that during BST insertion, each node’s height either stays the same or goes up by 1.

We have different cases which we’ll look at.

Case 1: Left-left Tree

In the Left-left case, we did an insertion in the left child of the left child of this node, which broke the invariant.

To fix this, we just make a right rotation

Case 2: Right-right Tree

In the Right-right case, we did an insertion in the right child of the right child of this node, which broke the tree.

To fix we make a left rotation!

Case 3: Left-right Tree

In the Left-right case, the extra height is in the root’s left child then in the right child of that. We can’t fix it with one rotation instead - but luckily we still have a solution.

First we’re going to rotate the left subtree left. When we have performed the left rotation, we’ve now created a Left-left tree, which is fixed by one right rotation as we saw earlier.

Case 4: Right-left Tree

The same logic as case 3, the extra height is in the root’s right child and then in the left child of that one. To fix this we first make a right rotation in the right subtree - then make a final left rotation to fix the tree.

AVL Insertion Summary

So we have these four cases and, we know how to solve them - we just need to remember how we identify them. We identify which grandchild the extra unit of height is in.

AVL Deletion

In the case of deleting the node, the same situation occurs! But instead we reduce the height of some tree by 1, breaking the invariant.

So, therefore we can use the exact same balancing algorithm. So when implementing an AVL tree we only need to implement the balancing algorithm once!

In conclusion, AVL trees use rotation to keep the tree balanced. In the worst case height of the tree is $1.44\ log_2(n)$, but normally close to $log_2(n)$.

2-3 Trees

2-3 trees is also another height balanced trees. 2-3 trees is made from two kinds of nodes:

  • 2 - node: Has a key x and two subtrees A and B. Just like in a BST, A < x < B
  • 3 - node: Has two keys x, y with three subtrees, A, B, and C. As usual, A < x < B < y < C.

Searching in 2-3 Trees

It proceeds just as in a BST. Since - In a 2-node, compare search key against key in node, descend into one of the two children.

In the case of a 3-node, compare the search key against both keys in the node, descend into one of the three children.

How 2-3 Trees balance

In the AVL case we saw that we have to balance via rotations after each insertion/deletion. In the case of a 2-3 tree, we never have to balance since it’s always balanced.

Because if one child is null all children must be null together (you can test this if you want by drawing examples). This means the empty trees are all at one level (which implies a balanced tree).

The resulting tree will have a height of between $log_3(n)$ and $lg(n)$

Insertion in 2-3 trees

When we want to insert a new node to a 2-3 tree we use a normal insertion, however, this might break the tree via, for example, an unbalanced tree or a ‘4-node’.

To fix this we use the following algorithm:

  • absorb the node into its parent, move up to the parent
  • If the node is a 4-node, split it into 2-nodes.

We keep alternating these 2 steps, we stop once we don’t need to split.

Summary 2-3 trees

So conceptually they are quite simple - but the implementation is quite horrible, even though the complexity is great, since the tree is always perfectly balanced.

So to fix this we use something called a ‘red-black BSTs’.

Red-black BSTs

A red-black BSTs implement a 2-3 tree using a BST. We can see a 2-3 node as a BST tree since - a 2-node becomes a regular BST node, however, a 3-node becomes two BST nodes.

The name ‘red-black’ comes from that we color the BST nodes in order to distinguish 2-nodes from 3-nodes.

2-nodes become black and 3-nodes become black with red child. So red is ‘only part of a 3-node’.

So a Red-black BST is really just a 2-3 but store it in a BST since we translate it to the corresponding 2-3 tree.

Red-black Tree Invariant

The invariant that we must uphold for a red-black tree is:

  • A red node may only occur as the left child of a black node
  • Black balance: on any path from the root to a null, we pass through the same number of black nodes
  • And the BST invariant must also hold.

Red-black BST Insertion

To make an insertion into a red-black tree, we first make a normal BST insertion, the newly added node is a red node.

This may break our invariant on red nodes, but this way we always keep the black balance and BST properties.

But this way we’ll only break the invariant by placing a red node in a forbidden place. There are 3 cases which this will happen.

Case 1: Skew

In a skew case, we have placed it to the right of a 2-node (black node) and therefore broken the first rule. We fix it by an operation called skewing.

Which is a kind of rotation, we’ll look at the implementation later.

Case 2: Split

In a split case we have created a 4-node (or a red having a red child), therefore we need to split it into 2-nodes. In the case of splitting, we may break the black balance which means we need to continue up recursively

Case 3: Red Root Node

After a split, we might end up with a red node as the root of the whole tree. In a case like this, the fix is simple, we just make the root a black node instead!

Red-black Tree Implementation

class RBNode:
    def __init__(self, val):
        self.red = False
        self.parent = None
        self.val = val
        self.left = None
        self.right = None

class RBTree:
    def __init__(self):
        self.nil = RBNode(0)
        self.nil.red = False
        self.nil.left = None
        self.nil.right = None
        self.root = self.nil

def insert(self, val):
    new_node = RBNode(val)
    new_node.parent = None
    new_node.left = self.nil
    new_node.right = self.nil

    # New node must be red
    new_node.red = True

    parent = None
    current = self.root
    while current != self.nil:
        parent = current
        if new_node.val < current.val:
            current = current.left
        elif new_node.val > current.val:
            current = current.right
        else:
            return

    new_node.parent = parent
    if parent == None:
        self.root = new_node
    elif new_node.val < parent.val:
        parent.left = new_node
    else:
        parent.right = new_node

    self.fix_insert(new_node)

def rotate_left(self, x):
    y = x.right
    x.right = y.left
    if y.left != self.nil:
        y.left.parent = x

    y.parent = x.parent
    if x.parent == None:
        self.root = y
    elif x == x.parent.left:
        x.parent.left = y
    else:
        x.parent.right = y
    y.left = x
    x.parent = y

def rotate_right(self, x):
    y = x.left
    x.left = y.right
    if y.right != self.nil:
        y.right.parent = x

    y.parent = x.parent
    if x.parent == None:
        self.root = y
    elif x == x.parent.right:
        x.parent.right = y
    else:
        x.parent.left = y
    y.right = x
    x.parent = y

def fix_insert(self, new_node):
    while new_node != self.root and new_node.parent.red:
        if new_node.parent == new_node.parent.parent.right:
            u = new_node.parent.parent.left  # uncle
            if u.red:

                u.red = False
                new_node.parent.red = False
                new_node.parent.parent.red = True
                new_node = new_node.parent.parent
            else:
                if new_node == new_node.parent.left:
                    new_node = new_node.parent
                    self.rotate_right(new_node)
                new_node.parent.red = False
                new_node.parent.parent.red = True
                self.rotate_left(new_node.parent.parent)
        else:
            u = new_node.parent.parent.right  # uncle

            if u.red:
                u.red = False
                new_node.parent.red = False
                new_node.parent.parent.red = True
                new_node = new_node.parent.parent
            else:
                if new_node == new_node.parent.right:
                    new_node = new_node.parent
                    self.rotate_left(new_node)
                new_node.parent.red = False
                new_node.parent.parent.red = True
                self.rotate_right(new_node.parent.parent)
    self.root.red = False

def exists(self, val):
    curr = self.root
    while curr != self.nil and val != curr.val:
        if val < curr.val:
            curr = curr.left
        else:
            curr = curr.right
    return curr

Red-black Tree Summary

Red-black trees are 2-3 trees implemented using a BST - they have the invariant that they have to correspond to a perfectly balanced 2-3 tree.

Update strategy is: Insert in a way that break the rules on red nodes, then fix it.

They have logarithmic performance - worst case the height is $2\ log(n)$

This was the end of this part - it was a lot to process, it was a lot for me as well :). In the next part we’ll cover priority queues and how they can be implemented as a kind of tree :).