Part 10 - Parallel linked lists

Posted on Feb 19, 2023

(Last updated: May 26, 2024)

In this part we’ll cover the synchronization challenges that arise when designing (correct) and efficient parallelizations.

But let’s first see the burdens with locks

The trouble with locks

Standard techniques for concurrent programming are ultimately based on locks.

Programming with locks has several drawbacks:

Performance overhead.
Lock granularity is hard to choose:
- Not enough locking: race conditions.
- Too much locking: not enough parallelism.
Risk of deadlock and starvation.
Lock-based implementations do not compose (easily).
Lock-based programs are hard to maintain and modify.

Message-passing programming is higher-level, but it also inevitably incurs on synchronization costs – of magnitude comparable to those associated with locks.

A good rule of thumb is:

Lock-based programming is pessimistic: be prepared for the worst possible conditions:
- If things can go wrong, they will.
Lock-free programming is optimistic: do what you have to do without worrying about race conditions:
- If things go wrong, just try again!

Ultimately, what Lock-free programming relies on is:

Using stronger primitives for atomic access.
Building optimistic algorithms using those primitives.

For example, remember the compare-and-set and test-and-set methods we have encountered.

Even if these are not ‘free’, these operations also take time and performance.

There are two different classes of lock-free algorithms:

Lock-free: guarantee system-wide progress: infinitely often, some process makes progress.
Wait-free: guarantee per-process progress: every process eventually makes progress.

Wait-free is stronger than lock-free:

Lock-free algorithms are free from deadlock.
Wait-free algorithms are free from deadlock and starvation.

Let’s see how we can parallelize linked lists!

But before that, let’s quickly just have our linked list defined:

class SequentialNode<T> implements Node<T> {
    // Value stored in node
    private T item;

    // Hash code of item
    private int key;

    // Next node in chain
    private Node<T> next;

    T item() { return item; }
    int key() { return key; }
    Node<T> next() { return next; }

    void setItem(T item) { this.item = item; }
    void setKey(int key) { this.key = key; }
    void setNext(Node<T> next) { this.next = next; }

    protected Node<T>, Node<T> find(Node<T> start, int key) {
        Node<T> pred, curr;

        curr = start;

        do {
            pred = curr; curr = curr.next();
        } while (curr.key() < key);

        return (pred,curr);
    }

    public boolean has(T item) {
        int key = item.key();

        // Find position of key from head:
        Node<T> pred, curr = find(head, key);

        return curr.key() == key;
    }

    public boolean add(T item) {
        // New node to be added
        Node<T> node = new Node<>(item);

        // curr.key >= item.key()
        Node<T> pred, curr = find(head, item.key());

        if (curr.key() == item.key()) {
            return false;
        }

        else {
            node.setNext(curr);
            pred.setNext(node);
            return true;
        }
    }

    public boolean remove(T item) {
        // curr.key() >= item.key()
        Node<T> pred, curr = find(head, item.key());

        if (curr.key() > item.key()) {
            return false;
        }

        else {
            pred.setNext(curr.next());
            return true;
        }
    }
}

Coarse grained locking

The simple idea is to lock every method.

class CoarseSet<T> extends SequentialSet<T> {
    // Lock controlling access to the whole set
    private Lock lock = new ReentrantLock();

    public boolean add(T item) {
        lock.lock();

        try {
            // Execute ‘add’ while locking
            return super.add(item);
        } finally {
            // Done: release lock
            lock.unlock();
        }
    }

    public boolean remove(T item) {
        lock.lock();

        try {
            // Execute ‘remove’ while locking
            return super.remove(item);
        } finally {
            // Done: release lock
            lock.unlock();
        }
    }

    public boolean has(T item) {
        lock.lock();
        try {
            // Execute ‘has’ while locking
            return super.has(item);
        } finally {
            // Done: release lock
            lock.unlock();
        }
    }
}

Now this is a very naive approach - but let’s list the pros and cons:

Pros:
- Obviously correct – avoids race conditions and deadlocks.
- If the lock is fair, so is access to the list.
- If contention is low (not many threads accessing the set concurrently), CoarseSet is quite efficient.
Cons:
- Access to the list is sequential – missing opportunities for parallelization.
- If contention is high (many threads accessing the set concurrently), CoarseSet is quite slow.

Now let’s look at CoarseSet predecessor

Fine-grained locking

The idea here is, instead of locking the whole list when doing operations, we just lock that specific node(s).

public class FineSet<T> extends SequentialSet<T> {
    public FineSet() {
        // Smallest key
        head = new LockableNode<>(Integer.MIN _ VALUE);

        // Largest key
        tail = new LockableNode<>(Integer.MAX _ VALUE);

        head.setNext(tail);
    }
}

Since each node is lockable:

class LockableNode<T> extends SequentialNode<T> {
    private Lock lock = new ReentrantLock();
        void lock() { lock.lock(); }
        void unlock() { lock.unlock(); }
    }
}

Now let’s see the implementations of each operation:

// Find while locking pred and curr, return locked position
protected Node<T>, Node<T> find(Node<T> start, int key) {
    Node<T> pred, curr;
    pred = start; curr = start.next();
    pred.lock(); curr.lock();

    while (curr.key < key) {
        // Unlock pred node
        pred.unlock();

        // Move to next node
        pred = curr; curr = curr.next();

        // Lock next node
        curr.lock();
    }

    return (pred, curr);
}

public boolean add(T item) {
    Node<T> node = new LockableNode<>(item);

    try {
        Node<T> pred, curr = find(head, item.key());
        // Add node as in SequentialSet, while locking
    } finally {
        pred.unlock(); curr.unlock();
    }
}

public boolean remove(T item) {
    try {
        Node<T> pred, curr = find(head, item.key());
        // Remove node as in SequentialSet, while locking
    } finally {
        pred.unlock(); curr.unlock();
    }
}

public boolean has(T item) {
    try {
        Node<T> pred, curr = find(head, item.key());
        // Check node as in SequentialSet, while locking
    } finally {
        pred.unlock(); curr.unlock();
    }
}

As we see, it’s all about the find operation.

Pros and cons:

Pros:
- If locks are fair, so is access to the list, because threads proceed along the list one after the other without changing order.
- Threads operating on disjoint portions of the list may be able to operate in parallel.
Cons:
- It is still possible that one thread prevents another thread from operating in parallel on a disjoint portion of the list – for example, if one thread wants to access the end of the list but, another thread blocks it while locking the beginning of the list.
- The hand-over-hand locking protocol may be quite slow, as it involves a significant number of lock operations.

Optimistic locking

Let’s try to implement find without using locks. The idea is to validate a position after finding it, there is some implementation detail about the nodes, for example we need to make sure we have the volatile keyword for the next attribute in a node.

But that is besides the point, let’s take an overview of how the operations should work:

Find the item’s position inside the list without locking.
Lock the position’s nodes pred and curr.
Validate the position while the nodes are locked:
- 3.1 If the position is valid, perform the operation while the nodes are locked, then release locks.
- 3.2 If the position is invalid, release locks and repeat the operation from scratch.

This approach is optimistic because it works well when validation is often successful (so we don’t have to repeat operations).

Now let’s implement these operations:

// Find as in SequentialSet find

public boolean add(T item) {
    Node<T> node = new ReadWriteNode<>(item);

    do {
        Node<T> pred, curr = find(head, item.key());
        pred.lock(); curr.lock();
        try {
            if (valid(pred, curr)) {
                // Physically add node
            }
        } finally {
            pred.unlock(); curr.unlock();
        }
    } while (true);
}

public boolean remove(T item) {
    do {
        Node<T> pred, curr = find(head, item.key());
        pred.lock(); curr.lock();
        try {
            if (valid(pred, curr)) {
                // Physically remove node
            }
        } finally {
            pred.unlock(); curr.unlock();
        }
    } while (true);
    // If not valid: try again!
}

public boolean has(T item) {
    do {
        Node<T> pred, curr = find(head, item.key());
        pred.lock(); curr.lock();
        try {
            if (valid(pred, curr)) {
                return curr.key() == item.key();
            }
        } finally {
            pred.unlock(); curr.unlock();
        }
    } while (true);
    // If not valid: try again!
}

protected boolean valid(Node<T> pred, Node<T> curr) {
    Node<T> node = head;
    while (node.key() <= pred.key()) {
        if (node == pred) {
            return pred.next() == curr;
        }

        node = node.next();
    }

    return false;
}

Pros:
- Threads operating on disjoint portions of the list can operate in parallel.
- When validation often succeeds, there is much less locking involved than in FineSet.
Cons:
- OptimisticSet is not starvation free; A thread, $t$, may fail validation forever if other threads keep removing and adding pred / curr between when $t$ performs find and when it locks pred and curr.
- If traversing the list twice without locking is not significantly faster than traversing it once with locking, OptimisticSet does not have a clear advantage over FineSet.

Lazy node removal

This is the idea that, we need a way to atomically share the information that a node is being removed, but without locking.

To this end, each node includes a flag valid with setters and getters.

Which means:

Validation only needs to check the mark valid.
Operation remove marks a node invalid before removing it.
Operation has is lock-free.
Operation add works as in OptimisticSet.

Let’s start with the implementation:

public class LazySet<T> extends OptimisticSet<T> {
    public LazySet() {
        head = new ValidatedNode<>(Integer.MIN_VALUE);
        tail = new ValidatedNode<>(Integer.MAX_VALUE);
        head.setNext(tail);
    }
}

protected boolean valid(Node<T> pred, Node<T> curr) {
    return pred.valid() && curr.valid() && pred.next() == curr;
}

public boolean has(T item) {
    Node<T> pred, curr = find(head, item.key());
    return curr.valid() && curr.key() == item.key();
}

public boolean remove(T item) {
    do {
        Node<T> pred, curr = find(head, item.key());
        pred.lock(); curr.lock();
        try {
            if (valid(pred, curr)) {
                if (curr.key() != item.key()) {
                    return false;
                }

                else {
                    curr.setInvalid();
                    pred.setNext(curr.next());
                    return true;
                }
            }
        } finally {
            pred.unlock(); curr.unlock();
        }
    } while (true);
}

Pros:
- Validation is constant time.
- Membership checking does not require any locking – it’s even wait-free (it traverses the list once without locking).
- Physical removal of logically removed nodes could be batched and performed when convenient – thus reducing the number of times the physical chain of nodes is changed, in turn reducing the expensive propagation of information between threads.
Cons:
- Operations add and remove still require locking (as in OptimisticSet), which may reduce the amount of parallelism.

Now lastly, let’s solve this using no locks!

Lock free access

If we’re not using locks we need to use stronger synchronization primitives than locks.

Therefore we’ll use the compare-and-set operation.

Let’s try to implement remove using this:

public boolean remove(T item) {
    boolean done;
    do {
        Node<T> pred, curr = find(head, item.key());
        if (curr.key() >= item.key()) {
            return false;
        }

        else {
            done = pred.next().compareAndSet(pred.next(), curr.next());
        }

    } while (!done);

    return true;
}

This is however a naive approach and will not work, unfortunately. If two threads call remove at the same time, it is possible that only one of them are successful.

We will need to borrow the idea of marking and updating nodes from LazySet.

Which means:

class AtomicMarkableReference<V> {
    // Current reference and mark
    V, boolean get();

    // if reference == expectRef set mark to newMark and return true
    // otherwise do not change anything and return false.

    boolean attemptMark(V expectRef, boolean newMark);

    // if reference == expectRef and mark == expectMark,
    // set reference to newRef, mark to newMark and return true;
    // otherwise, do not change anything and return false.

    boolean compareAndSet(V expectRef, V newRef, boolean expectMark, boolean newMark)
}

There are some more implementation details about our nodes, but let’s skip that :).

public boolean remove(T item) {
    do {
        Node<T> pred, curr = find(head, item.key());
        if (curr.key() != item.key() || !curr.valid()) {
            return false;
        }

        if (!curr.setInvalid()) {
            continue;
        }
        pred.setNextIfValid(curr, curr.next());
        return true;
    } while (true);
    // changed during logical removal: try again!
}

public boolean add(T item) {
    do {
        Node<T> pred, curr = find(head, item.key());
        if (curr.key() == item.key() && curr.valid()) {
            // already in set and valid
            return false;
        }
        Node<T> node = new LockFreeNode<>(item).setNext(curr);

        if (pred.setNextIfValid(curr, node)) {
            return true;
        }
    } while (true);
    // pred changed during add: try again!
}

public boolean has(T item) {
    Node<T> pred, curr = super.find(head, item.key());
    return curr.valid() && curr.key() == item.key();
}

protected Node<T>, Node<T> find(Node<T> start, int key) {
    boolean valid;
    Node<T> pred, curr, succ;

    do {
        pred = start;
        curr = start.next();
        do {
            succ, valid = curr.nextValid();
            while (!valid) {
                if (!pred.setNextIfValid(curr, succ)) continue retry;
                curr = succ; succ, valid = curr.nextValid();
            }
        if (curr.key() >= key) return (pred, curr);
        pred = curr; curr = succ;
        } while (true);
    } while (true);
}

Pros:
- No operations require locking: maximum potential for parallelism.
- Membership checking does not require any locking – it’s even wait-free (it traverses the list once without locking).
Cons:
- The implementation needs test-and-set-like synchronization primitives, which have to be supported and come with their own performance costs.
- Operations add and remove are lock-free but not wait-free: they may have to repeat operations, and they may be delayed while they physically remove invalid nodes, with the risk of introducing contention on nodes that have been already previously logically deleted.

When to lock and not

Each of the different implementations of concurrent set is the best choice for certain applications and not for others:

CoarseSet works well with low contention.
FineSet works well when threads tend to access the list orderly.
OptimisticSet works well to let threads operate on disjoint portions of the list.
LazySet works well when batching invalid node removal is convenient.
LockFreeSet works well when locking is quite expensive