Part 9 - Testing

Posted on Mar 5, 2023
(Last updated: May 26, 2024)

When creating digital circuits, there are a lot of things that can go wrong.

When designing the circuit, it may happen that the designers have implemented a bug into the circuit.

We call these design bugs. On the other hand, when physically making the chip, it may happen that we get permanent (implementation) fault, or other (transient) faults.

In the first case we need to perform verification, in the latter case, it’s testing and fault tolerance.

Verification vs Testing

Verification is the task of ensuring that a design meets its specification

  • Looking for design mistakes

Testing is performed to ensure that a particular instantiation of a design functions properly

  • Looking for implementation faults

Verification

The verification should cover:

  • The set of test patterns (test suit) written to verify a design should be complete.
  • Specification coverage
    • All the features of the design have to be specified
  • Implementation coverage
    • Every line of the VHDL code should be checked.
    • Examples:
      • Every case of a case statement, should be activated.
      • For every state machine in our design, every edge between states should be traversed.

Faults

There are different types of faults:

  • Transient faults
    • Faults that happen only once (and it’s VERY unlikely to happen again)
    • Causes:
      • Electromagnetic Interference:
        • Neighbors mobile phone
        • Static electricity
      • Various particles hitting the silicon surface:
        • Heavy ions such as iron, $\alpha$-particles, neutrons.
      • Internal effects:
        • Crosstalk, metastability, power supply disturbances
  • Permanent faults
    • Faults that are always there
    • Causes:
      • Design defects
      • Manufacturing defects
      • Transistor aging
  • Intermittent faults
    • Faults that come and go (probably periodically)
    • Causes: Variations
      • Static: transistors on a chip may not be exactly the same, although, they were supposed to be
      • Dynamic: temperature changes

Defects and Yield

$$ yield = \frac{\text{number of working chips produced}}{\text{Total number of chips produced}} $$

$$ cost = \frac{\text{Cost of fabricating and testing a wafer}}{\text{Yield} \cdot\ \text{Number of chip sites on the wafer}} $$

$$ Y = \left(1 + \frac{Ad}{\alpha}\right)^{- \alpha} \newline A = \text{Chip area} \newline d = \text{Defect density} \newline \alpha = \text{Fault clustering parameter} $$

Testing

When testing, for example, a 64-bit adder, it would take over 500+ years, if we could test an input each ns.

Therefore, a subset of all possible inputs are used when testing.

This subset of inputs is determined, as follows:

  • Random test vectors
  • Based on the logical function (functional testing)
    • E.g. check boundary conditions which make your circuit operate differently
  • Based on the design structure (structural testing)
    • E.g. partition your hardware in smaller blocks (exploit the hierarchy of your hardware design)
  • Based on the fault model
    • Group a huge number of things that can go wrong in a much smaller finite set of fault models

Fault models

A fault model is an abstract model of physical faults that could cause a chip not to work.

Stuck-at fault model: This model abstracts all potential faults as resulting in a logical node of the circuit being either stuck at logic 0 or stuck at logic 1.

Algorithm for “path sensitization”

  1. Activate: Specify inputs to generate the appropriate (opposite than the fault) value (0 for SA-1, 1 for SA-0) at the site of the fault.

  2. Propagate: Select a path from the site of the fault to an output and specify additional signal values to propagate the fault signal along this path to the output (error propagation).

  3. Justify: Specify input values to produce the signal values specified in (2) (line justification).

  • The process is finished when all the assigned values have been solved by “justify” and there are no conflicts
  • A conflict has occurred on two different Justifications require different values assigned to a node.
    • An input value required to activate the fault is different from the input values required to propagate the fault.
  • If a conflict arises, we look for another way for justification.