My BSc Thesis, ClaudesLens

Posted on Jun 18, 2024
(Last updated: Jun 19, 2024)

Introduction

I have finally finished undergrad and would like to make a blog post about what I have been working on these past ~6 months. The tile of our thesis is:

ClaudesLens: Uncertainty Quantification in Computer Vision Models

However, before I dive into the project and what we actually did, let me tell you what we wanted to do.

BayesLens

Originally, we wanted to create “Uncertainty-Aware Attention Mechanisms”. What we specifically had in mind was to create a transformer model that used Bayesian Neural Networks (BNNs), and even more ambitiously, apply this to self-driving cars.

Needless to say, this was a bit too ambitious for a BSc thesis, so we had to scale down our project a bit. We didn’t have the prerequisite knowledge or the compute to do such a task within that time frame and with other courses.

So about ~1/3 into the project, when our supervisor wanted us to explore the entropy of predictions and got really excited about our results, we got ClaudesLens.

ClaudesLens

From the results using entropy as a measure of uncertainty, we decided to focus on this instead. I’ll go into more detail and motivate how this approach works, but believe that this is a very natural way to quantify uncertainty.

I plan to explain this project from the ground up, from first principles so to say, so let’s start what lies at the heart of this project: Neural Networks.

Neural Networks

There are many ways to explain neural networks, in this post I will use a mathematical approach which will let us view the entire network as a single function.

The Neuron

At the core of a neural network lies the neuron, which is inspired by the biological neuron.

Each neuron takes in one or more scalars, $x_j$, as input and outputs a single scalar, $y$. Each input, $x_j$, is scaled by an associated weight denoted as $w_j$. The neuron also has a special input called the bias, $b$.

The neuron has two stages it goes through, summation and activation.

neuron Figure 1: A single neuron with n inputs and one output, showcasing the summation and activation components.

The summation stage is where the neuron calculates the weighted sum of the inputs and the bias: $$ z = \sum_{j=1}^{n} w_j x_j + b $$

The activation function, denoted as $f$, calculates the neuron’s output $y = f(z)$ based on the weighted summation. Activation functions introduce non-linearity, enabling neural networks to approximate complex, non-linear functions.

The Network

Lets build upon what we now have learned and see how we can extend this.

We can represent the inputs of a neuron as a vector, $$ \mathbf{x} = \left[x_1, x_2, \ldots, x_n\right], $$

where each element corresponds to an input to the neuron.

Similarly, we can represent the associated weights as a vector, $$ \mathbf{w} = \left[w_1, w_2, \ldots, w_n\right], $$

with this the summation can be simplified to a dot product, $$ z = \mathbf{w} \cdot \mathbf{x} + b. $$

But only using one neuron will only get us so far, if we instead have multiple neurons and try to mimic the structure of the brain, we can get something more powerful.

A layer is a collection of neurons, stacked on top of each other. Very often when we are referring to a layer, are we referring to a fully connected layer, where each neuron in the layer is connected to all the neurons in the previous layer.

In the case of a network, we can now talk about the input layer and the output layer.

<TODO, need to think deeply about how to explain this properly, let’s continue tomorrow:]>

nn