Part 9 - Shortest Path & Spanning Trees

Posted on Dec 7, 2022

(Last updated: May 26, 2024)

In this part we’ll cover spanning trees along with how we can use these to solve pathfinding problems!

Weighted Graphs

As we have seen in the last part - graphs can we directed or undirected, the same thing goes for the weights on the edge. These represent some kind of ‘cost’, usually these are positive float numbers, but they could be anything - although many algorithms only work for positive weights.

Spanning Trees

A spanning tree of an undirected graph, $G$, is a sub graph $T$ such that:

Is connected
Is acyclic
Includes all the nodes

The important thing here is that the graph is connected, which means we don’t have any lose sub graphs, we must have one connected unit.

Minimum Spanning Tree Problem

Given a connected, undirected weighted graph, $G$ - what is the spanning tree with the minimum total weight

This specific tree is called the MST, or Minimum Spanning Tree. MSTs have a lot of applications in the real-world.

Let’s begin to look at some algorithms which find these MSTs.

Greedy algorithms

Before we jump into the actual algorithms, let’s look at some fundamental properties first.

A cut in a graph is a partition of its nodes into two (nonempty) sets.
A crossing edge connects a node in one set with a node in another set.

With these two properties, there’s a theorem which proves that:

Given any cut, the crossing edge of minimal weight is in the MST

Algorithmic idea:

Start with all edges colored gray
Find cut with no black crossing edges; Color it’s min-weight edge black
Repeat until $V - 1$ edges are black

This isn’t quite enough for an actual implementation since:

How do we choose the cut?
How do we actually choose the edge with minimum weight after finding all crossing edges?

We’ll now cover some actual algorithms which implements this.

Overview

We’ll cover Kruskal’s algorithm and Prim’s algorithm

The basic idea for Kruskal’s is:

Consider edges in ascending order of weight
Add next edge to the MST, unless doing so would create a cycle
Repeat until there are $V - 1$ edges in the MST

And for Prim’s algorithm:

Start with node 0 and greedily grow the MST
Add to the MST the minimum weight edge with exactly one endpoint in the MST
Repeat until there are $V - 1$ edges in the MST

We’ll begin with Kruskal’s algorithm

Kruskal’s algorithm

So, as in the overview, kruskal’s algorithm is quite simple in the idea - but the only implementation challenge is - how do we determine if we create a cycle?

But let’s break it down. The algorithm stores several disjoint subtrees of the final MST - when we add a edge, it merges two of the subtrees.

So testing for cyclicity, we need to check if two nodes appear in the same set!

There’s actually a good (and well-known) data structure for this exact problem, the disjoint-set data structure. It supports merging and testing in $\mathcal{O}(log*\ n)$ time. (Note log* is an actual function, it’s constant for all practical purposes)

A pseudocode implementation would be:

def kruskals(graph: Graph):
    result = []
    i, e = 0, 0
    graph = sort(graph)
    parent = []
    rank = []
    for vertex in range(graph.V):
        parent.append(vertex)
        rank.append(0)

    while e < graph.V - 1:
        u, v, w = graph[i]
        i = i + 1
        x = graph.find(parent, u)
        y = graph.find(parent, v)
        if x != y:
            e = e + 1
            result.append([u, v, w])
            graph.apply_union(parent, rank, x, y)

Prim’s algorithm

The challenge here is instead to find and remove the min weight edge with exactly one endpoint in T.

If try all edges we will of course have a complexity of $\mathcal{O}(E^2)$. So let’s not do that :)

However, if we use a priority queue, we can get a $\mathcal{O}(E\ log(E))$. This is still the ’lazy’ version but let’s look at the idea:

Maintain a PQ of edges having (at least) one endpoint in T:

Remove the minimum edge to determine next edge $e = v - w$ to add to T.
Disregard $e$ if both endpoints $v$ and $w$ are marked (both in T).
Otherwise, let $w$ be the unmarked node (not in T).
- Add to the PQ, any outgoing edge from $w$ (assuming other endpoint is not in T).
- Add $e$ to T and mark $w$.

The eager solution would be the same but: Maintain a PQ of nodes (PQ has at most one entry per node) connected by an edge to T, where priority of node $v$ = weight of shortest edge connecting $v$ to T.

Remove the minimum node $v$ and add its associated edge $e = v - w$ to T.
Update PQ by considering all edges $e = v - u$ incident to $v$
- Ignore if $u$ is already in T.
- Add $u$ to PQ if not already on it.
- Decrease priority of $u$ if $v - u$ becomes shortest edge connecting $u$ to T.

Note this requires an index PQ, which we haven’t implemented or went over.

Complexity

The complexity of these algorithms are:

Prim’s
- $\mathcal{O}(E\ log(E))$
Kruskal’s
- $\mathcal{O}(E\ log(E))$

Shortest Path Algorithms

Finding the shortest path in a given tree is a very complex problem which we will encounter in the real-world in many applications!

For example in: Map Navigation, Robot navigation, Urban traffic planning, Network routing (OSPF, BGP, RIP) etc.

Variants

There are many different problems when it comes to the shortest path problem - Do we want the shortest path between:

Single source: One node, S, to every other node
Single sink: From every node to one node, T.
Source-sink: From one node, S, to another node, T.
All pairs: Between all possible pairs of nodes.

But we also need to weigh in, is there any restrictions on the edge weights? Also, if we have any restraints on cycles?

But with all this in mind - let’s simplify - the shortest path from S to each node V exists.

If you remember, here’s our ADT/API for directed graphs:

class Graph<Vertices>:
    // Adds an edge to the graph
    add_edge(e: Edge<Vertices>)

    // Removes an edge from the graph
    remove_edge(e: Edge<Vertices>)

    // Returns true if the edge is present in the graph, otherwise false
    contains_edge(e: Edge<Vertices>) -> boolean

    // Returns all the edges which are connected with the vertices
    outgoing_edges(from: Vertices) -> Collection<Edge<Vertices>>

    // Returns the number of vertices present in the graph
    n_vertices() -> Int

    // Returns the number of edges present in the graph
    n_edges() -> Int

class Edge<Vertices>:
    from   : Vertices
    to     : Vertices
    weight : float = 1.0

Single-source shortest paths

The single-source shortest path problem or, SSSP for short. We want to find the shortest path from S to every other node.

The solution is spanning trees! However, it’s not the MST but the so called shortest-paths tree, SPT.

In an SPT, the starting node $s$ is the root of the tree:

How do we retrace the shortest path from and to any nodes?
- This consists of all ancestors of $t$, in reverse order
- Therefore, all information we need is the parent of every node in the SPT
How do we determine the length of the shortest path from and to any nodes?
- This is the sum of the edge in that path
- So either we need to store the edge weight, or the edge itself in the SPT

So a generalized generic GraphSearch would be:

GraphSearch(start: V):
    put start in collection
    repeat until the colletion is empty:
        remove a vertex v
        if v is not visited:
            mark v as visited
            add all unvisited adjacent nodes

Now let’s suppose this collection is a priority queue - the weights are the total cost from s to v

This is the so-called Dijkstra’s/uniform cost algorithm/search.

UCS(start: V):
    Q = PQ // Order by the cost from start
    Q.put(start)
    while is not Q.isempty():
        Q.remove_min()
        if v not in visited:
            visited.add(v) // mark as visited
            for unvisited vertex w adjacent to v:
                the cost from s to w = cost from s to v + the cost from v to w
                Q.put(w)

This is the so-called single source algorithm, if we want a source-sink we just add:

UCS(start: V):
    Q = PQ // Order by the cost from start
    Q.put(start)
    while is not Q.isempty():
        Q.remove_min()
        if v is the goal, return the shortest path from s to v
        if v not in visited:
            visited.add(v) // mark as visited
            for unvisited vertex w adjacent to v:
                the cost from s to w = cost from s to v + the cost from v to w
                Q.put(w)

Shortest path in a DAG

If we know the graph doesn’t contain any cycles, this problem become a lot easier.

We use the same algorithm as a UCS - but we don’t need a PQ and, we can consider the nodes in a topological order.

This reduces the complexity to $\mathcal{O}(E)$.

Negative weights

Dijkstras/UCS won’t work with negative edge weights - we really can’t fix this, so instead we use the Bellman-ford algorithm.

Complexity

Here’s a table of all the complexities:

Topological Sort
- Restriction
  - No directed cycles
- Worst-case time complexity
  - $\mathcal{O}(E + V)$
- Space-usage
  - $\mathcal{O}(V)$
UCS/Dijkstras
- Restriction
  - No negative weights
- Worst-case time complexity
  - $\mathcal{O}(E\ log(E))$
- Space-usage
  - $\mathcal{O}(E)$
Dijkstras eager
- Restriction
  - No negative weights
- Worst-case time complexity
  - $\mathcal{O}(E\ log(V))$
- Space-usage
  - $\mathcal{O}(V)$
Bellman-ford
- Restriction
  - No negative cycles
- Worst-case time complexity
  - $\mathcal{O}(E\ V)$
- Space-usage
  - $\mathcal{O}(V)$

Summary

So the summary really here is knowing when to use what depending on what you have.

So in the case we have non-negative weights (which is the most often scenario), we can almost always use UCS/Dijkstras, since this is almost linear.

If we encounter a DAG, we should perform a Topological sort algorithm, since it’s linear

Negative weights and negative cycles, If we have no negative cycles - then Bellman-ford is the approach, otherwise we can find some path via Bellman-ford.