Part 3 - Dynamic Arrays

Posted on Nov 27, 2022
(Last updated: May 26, 2024)

Dynamic arrays

Finally we have arrived at the DS of DSA. In (almost) all programming languages the most common built in data structure are arrays. We have seen them, worked with them - nothing new. We’ve probably even worked with dynamic arrays, they are an essential data structure - sometimes we don’t know how much data we’ll need.

But how are dynamic arrays actually implemented into a language? The ideas are actually quite simple - we use a fixed length array and make a larger, fixed length array again, when we have run out of space.

Different approaches

Naive Approach

A very naive and brute force implementation would be, if we append a value to an array which doesn’t have space, we copy the old array and make the index one larger so, we can fit the new value. The problem with this approach is that it’s really slow. If I want to append 10 items to a list of size 1, we need to copy each time we append.

The complexity of this becomes $\mathcal{O}(n^2)$, which again, is bad.

A Better, but Naive Approach

A better approach would be that each we need to resize the array - we make some extra space for future appends. Let’s say each time we need to resize the array we add 10 more indexes, for future use.

This is actually 10 times faster than our first approach! But it’s still $\mathcal{O}(n^2)$ so…

Multiplicative Approach

So, instead of resizing by a constant - why not just double the array size each time? With this approach:

To reach an array of size $2^n$ we would need to do $1 + 2 + 4 + \dots + 2^{n-1}$ total element copies This sums up to $2^n - 1$ which makes this approach $\mathcal{O}(n)$!

Worst-case Complexities

One might go through an example about the worst-case complexity of these two approaches. If we do that we find that both have a worst-case complexity of $\mathcal{O}(n)$. If we ever need to resize, we would need to copy $n$ elements from the old array to the new one.

We now suppose we call the add function $n$ times instead. What’s the worst-case complexity now? For the constant approach it’s quite obvious that it becomes $\mathcal{O}(n^2)$, for each call, in the worst-case, we would need to copy n items, then n adds in total.

For our multiplicative approach - this instead becomes $\mathcal{O}(n)$. But this is still ’too slow’ - we would like a $\mathcal{O}(1)$ operation.

But don’t worry - here we have something called amortized complexity which plays a huge role.

Amortized Complexity

If add had a $\mathcal{O}(1)$ worst-case complexity, $n$ calls would take $\mathcal{O}(n)$ time right? But as we stated, our approach took $\mathcal{O}(n)$ for n calls - so we can say that add has a $\mathcal{O}(1)$ amortized complexity!

Basically what amortized complexity is, we take an average cost of the operation over n calls.

Now that we’ve defined how the most important feature of dynamic arrays should work - let’s implement one!

Implementation

A python implementation of a dynamic array:

class dynamic_array:
    def __init__(self):
        self.array = [None]
        self.size  = 0

    def get(self, i):
        if 0 <= i < self.size:
            return "Error, index out of bounds"

        return self.array[i]

    def set(self, i, val):
        if 0 <= i < self.size:
            return "Error, index out of bounds"

        self.array[i] = val

    def append(self, val):
        if self.size == len(self.array):
            self.resize()

        self.array[self.size] = val
        self.size += 1

    def resize(self):
        new_array = [None] * 2

        for i in range(len(self.array)):
            new_array[i] = self.array[i]

        self.array = new_array

Now this isn’t a perfect implementation - but it works okay.

Now that we understand really how dynamic arrays work - we can actually start implementing some more interesting data structures

Stacks and Queues

Stacks and Queues are quite popular and powerful data structures. But just to refresh let’s go over how both data structures work.

Stacks

A Stack is a so-called LIFO data structure. LIFO stands for ‘Last In First Out’. One can visualize it as a literal stack of items (therefore the name). If we have a stack of things, we might take items from the top and keep throwing away the items til we reach the bottom. This is exactly how the stack data structure works!

Important Functions for Stacks

These operations that I was talking about have names - if we remove the top item, we call it pop() - and if we place an item on the top of the stack it’s called push(). We have some helper functions that are usually needed for an implementation as well - the peek() function gives us the item at the top of the stack. The size() returns how large the stack currently is.

Queues

A queue is a so-called FIFO data structure, ‘First In First Out’. Just as the stack data structure, we can visualize this as a literal queue. If someone joins the queue you are put in last - and the first person to join queue will be the first to leave the queue as well.

How to implement Stacks and Queues

Now that we understood the basics of these data structures - let’s see how one can implement them. But before we dive right in let’s see the different approaches to this. Both stacks and queues can be implemented with Dynamic arrays as we covered - but just as good with linked lists. We’ll start with how we can implement it using a linked list - but let’s refresh our memory of what a linked list is.

Linked Lists

A (singly) linked list consist of ‘Node’ objects in a sequence - these node objects can contain one or multiple values - each node is ‘pointing’ to the next node in the sequence. A singly linked list points to the first node and the last node in the list points to nothing (Nothing can for example be Null) Since it would be inefficient to calculate the size each time we call size(), since we would need to traverse the whole list, we instead keep a dynamic size variable that we can access.

Here’s how the boilerplate classes for a Linked List might look:

class Node:
    def __init__(self, val, next = None):
        self.val = val
        self.next = next

class LinkedList:
    def __init__(self, val = None):
        self.head = None
        self.size = 0

With our new knowledge about linked lists - we can return to stacks!

Stacks as linked lists

An implementation with a linked list for a stack is perfect - we only care about the top element in stack, therefore, when we push(), we first make the new nodes next point to the current head. Then we redirect head to our new node and increase size. When we want to pop() - we first store the value of our current head node temporarily, to return the value popped. Then we redirect head to the next of the node we want to pop.

In many languages we do not need to manually remove the node from the list - since it will be handled by garbage collection. But in a language like C, we would need to free() that node from memory.

So let’s implement a stack using a linked list.

Implementation of a stack using a linked list

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None


class Stack:
    def __init__(self):
        self.head = None
        self.size = 0

    def is_empty(self):
        if self.head == None:
            return True

        return False

    def push(self, val):
        if self.head == None:
            self.head = Node(val)
            return

        new = Node(val)
        new.next = self.head
        self.head = new
        self.size += 1

    def pop(self):
        if self.head == None:
            return None

        pop = self.head.val
        self.head = self.head.next
        self.size -= 1
        return pop

Now let’s implement a stack using a dynamic array.

Implementation of a stack using a dynamic array

Before we start the implementation, let’s consider how we should implement it before. One thing that might not be obvious might be that, using the first element as the top - will work - but be super slow.

If we pop the first element - we would need to push forward the rest of the stack by 1 step. Which would take $\mathcal{O}(n)$. We want $\mathcal{O}(1)$!

So instead let’s use the last element as the top!

class Stack:
    def __init__(self):
        self.stack = [None]
        self.size = 0

    def is_empty(self):
        return self.size == 0

    def push(self, val):
        if self.size == len(self.stack):
            self.resize(2 * self.size)

        self.stack[self.size] = val
        self.size += 1

    def pop(self):
        self.size -= 1
        pop = self.stack[self.size]
        self.stack[self.size] = None
        if self.size == (len(self.stack) // 4):
            self.resize(len(self.stack) // 2)

        return pop

    def resize(self, factor):
        new_array = [None] * factor

        for i in range(self.size):
            new_array[i] = self.stack[i]

        self.stack = new_array

Implementation of a queue using linked lists

Now it’s now to implement queues! Using a linked list seems as a very natural approach since we have a sequence of pointers forward. However, in a queue we always need to the first and the last element. Therefore, we need another pointer to the end of the list as well.

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None

class Queue:
    def __init__(self):
        self.last = None
        self.first = None
        self.size = 0

    def is_empty(self):
        if self.first == None:
            return True

        return False

    def enqueue(self, val):
        new_node = Node(val)
        if self.is_empty():
            self.first = new_node
            self.last = new_node
        else:
            self.last.next = new_node
            self.last = new_node

        self.size += 1

    def dequeue(self):
        if self.is_empty():
            return None

        exit = self.first
        self.first = self.first.next
        self.size -= 1
        return exit.val

Quite simple and elegant!

Implementation of a queue using a (dynamic) array

One thing we have to go through first is that - our implementation will be a so-called circular array. Our pointers to the head and tail can (and will cross).

class Queue:
    def __init__(self):
        self.queue = [None]
        self.head = 0
        self.tail = 1

    def is_empty(self):
        return self.head == (self.tail + 1) % len(self.queue)

    def enqueue(self, val):
        if (self.tail + 1) % len(self.queue) == self.head:
            self.resize(2 * len(self.queue))

        self.queue[self.tail] = val
        self.tail = (self.tail + 1) % len(self.queue)

    def dequeue(self):
        if self.is_empty():
            raise Exception("Queue is empty")

        result = self.queue[self.head]
        self.head = (self.head + 1) % len(self.queue)

        return result

    def resize(self, new_size):
        old_queue = self.queue
        self.queue = [None] * new_size

        if self.head < self.tail:
            for i in range(self.tail - self.head):
                self.queue[i] = old_queue[(self.head + i) % len(old_queue)]

            self.head = 0
            self.tail = self.tail - self.head

        else:
            for i in range(len(old_queue) - 1):
                _index = (self.head + i) % len(old_queue)
                self.queue[i] = old_queue[_index]

            self.head = 0
            self.tail = self.tail - self.head - 1

That’s it for this part - in the next part we’ll cover something called ‘Abstract Data Types’ or for short ADTs. We’ll define what a ADT is and what’s the difference between them and ordinary data structures.