Part 7 - Floating-point arithmetic

Posted on Apr 25, 2023
(Last updated: May 26, 2024)

In this part we’ll cover some basics for floating-point arithmetic.

Integer addition

Just some basics:

  • When performing addition
    • Overflow if:
      • Adding a positive and negative operand - no overflow.
      • Adding two positive operands - if the carry-out bit is 1.
      • Adding two negative operands - if the carry-out bit is 0.
  • When performing subtraction:
    • Overflow if:
      • Subtracting two positive or negative operands - no overflow.
      • Subtracting a positive from a negative operand - overflow if the results sign bit is 0.
      • Subtracting a negative from a positive operand - overflow if the results sign bit is 1.

Multiplication

We will not cover how to perform binary multiplication - but it’s the same principle as decimal multiplication.

However, let’s cover the multiplication instructions that we can use in MIPS.

In MIPS, we have two 32-bit registers for the product that is generated by multiplication.

$HI: 32 most significant bits.

$LO: 32 least significant bits.

With the instructions:

# 64-bits product in HI/LO
mult rs, rt
multu rs, rt

# "Move from HI/LO" to rd
mfhi rd
mflo rd

# Least significant 32-bits of product -> rd
mul rd, rs, rt

Division

We will not cover division as well - but there are methods, the “usual” way we are thought - but also restoring division - which we cover here.

Now, $HI and $LO, also store the result after division.

$HI: 32-bit remainder $LO: 32-bit quotient

div rs, rt
divu rs, rt

mfhi rd
mflo rd

One thing that we’ll need to remember is that - there is no divide-by-0 control. This must be implemented either at hardware level, or as an exception routine.

Floating-point representation

The standard for floating-point representation is the IEEE Std 754-1985. This is also important for portability of code.

There are two representations, one for single precision (32-bit), and one for double precision (64-bit): $$ x = (-1)^S \cdot\ (1 + Fraction) \cdot\ 2^{Exponent - Bias} $$

Our sign bit, S, is always 1 bit. For the fraction part, it is 23 and 52-bits respectively for single or double precision. The exponent is 8 and 11-bits respectively.

We always have a normalized fraction, meaning we always have the format 1.frac.

We’ve covered this in good detail in the same post I linked earlier.

Floating-point in MIPS

Floating-point operations are handled by the Co-processor 1.

We also have separate floating-point registers, all of which are of 32-bit size. For double precision, we use them pairwise - meaning \$f0/$f1 will be one 64-bit floating-point number.

For floating-point load and store we use:

# Eqv to l.s, l.d, s.s, s.d
lwc1, ldc1, swc1, sdc1

For arithmetic:

add.s, sub.s, mul.s, div.s

The same goes for double precision:

add.d,.dub.d, mul.d, div.d

Compares and branches:

# condition can be eq, lt etc.
c.condition.s, c.condition.d

bc1t, bc1f

It sets a special floating-point condition bit, therefore, the branches take no operands.