# **Verification Testbench**

#### Nagesh Loke ARM CPU Verification Lead/Manager



The Architecture for the Digital World®

#### What to expect?

- This lecture aims to:
  - provide an idea of what a testbench is
  - help develop an understanding of the various components of a testbench
  - build an appreciation of the complexity in a testbench
  - highlight why it is as much a software problem as is a hardware problem

### What is a testbench?

- A testbench helps build an environment to test and verify a design
- The key components of a testbench are:
- Stimulus
  - is used to drive inputs of the design to generate a high-level of confidence
  - should be able to exercise all normal input scenarios and a good portion of critical combinations with ease
- Checker
  - is a parallel & independent implementation of the specification
  - is used verify the design output against the modeled output
- Coverage
  - Helps measure quality of stimulus
  - Provides a measure of confidence to help determine closure of verification effort



## What are we verifying?

- An ALU has
  - an input clock
  - two 8-bit inputs as operands
  - a 3-bit opcode as an operator
  - a 16-bit output
- Performs the following operations:
  - ADD, SUB, AND, OR, XOR, MUL, XNOR



#### How was it done in the past?

```
initial begin
  @(negedge clk);
  opcode = 3'b000; // ADD
  A = 8'h05;
  B = 8'h50;
  @(negedge clk);
  assert (OUT == 8'h55);
  @(negedge clk);
  A = 8'hFF;
  B = 8'h01;
  @(negedge clk);
  assert (OUT == 16'h100);
  @(negedge clk);
  @(negedge clk);
   $finish;
```

end

# What are some of the issues with this approach?



#### What should the approach be?

- Start with a Verification plan
- A Verification plan talks about:
  - various design features and scenarios that need to be tested
  - architecture of the testbench
  - reuse in higher level testbenches
- Testbench should have the ability to:
  - test as many input data and opcode combinations as possible
  - test different orders of opcodes
  - stress key features/combinations
  - use more machine time and less human time



## **SystemVerilog**

- SystemVerilog as a hardware verification language provides a rich set of features
- Data Types & Aggregate data types
  - Class, Event, Enum, Cast, Parameterization, Arrays, Associative arrays, Queues and manipulating methods
- OOP functionality
  - Classes, Inheritance, Encapsulation, Polymorphism, memory management
- Processes
  - fork-join control, wait statements
- Clocking blocks
- Interprocess synchronization & communication
  - Semaphores, Mailboxes, named events
- Assertions
- Functional Coverage
- Virtual Interfaces
- Constraints

#### Components of a testbench

```
`include "alu intf.svh"
 `include "alu trxn.svh"
 `include "alu mon.svh"
 `include "alu bfm.svh"
 `include "alu sb.svh"
module alu_tb ();
                  clk;
   logic
   logic [7:0]
                  A,B;
   logic [2:0]
                  opcode;
   logic [15:0]
                  OUT;
   logic
                  done;
   // Instantiate the interface
   alu intf alu if
      .clk(clk),
      .A(A),
      .B(B),
      .opcode (opcode),
      .0UT(0UT)
      );
   // Instantiate the design
```

ALU ALU (.\*);

- The ALU testbench module now looks different
- It includes headers for various components
  - ALU Interface
  - ALU Transaction
  - ALU Monitor
  - ALU BFM (driver)
  - ALU Scoreboard
- It creates the interfaces
- It instantiates the DUT



#### Main Test

```
program test;
                  num trxn = 10;
   int
   alu bfm bfm = new (alu if);
   alu sb sb
                 = new (alu if);
   // Generate a basic clock
   initial begin
      clk = 1'b0:
      forever begin
         #5 clk = !clk:
      end
   end
   // Call the testbench components
   initial begin
      fork
         bfm.drive();
         sb.check();
      join_none
   end
   // Give the test some time before $finish is called
   initial begin
      int i;
      for (i=0; i<num trxn*10; i++) begin</pre>
         @(negedge clk);
      end
      $finish:
   end
```

- A program block is the main entry point
- A bfm object and a scoreboard object are created
- All the components are started
- A fork/join process ensures that they all start in parallel
- We exit the fork statement at 0 time
- Simulation is stopped when \$finish is called
- Multiple initial blocks execute in parallel



#### Transaction class



- The ALU transaction class:
  - Uses an enum type for optype
  - Uses "rand" to declare inputs that need to be driven with random values
  - Has a print utility that can be used with a transaction handle/object

#### **BFM/driver**

#### class alu\_bfm;

virtual interface alu\_intf alu\_if; alu\_trxn trxn;

```
extern task drive ();
extern task drive_trxn (alu_trxn trxn);
```

```
function new (virtual interface alu_intf alu_if_in);
    alu_if = alu_if_in;
endfunction // new
```

endclass // alu\_bfm

task alu\_bfm::drive ();

```
while (1) begin
    trxn = new ();
    trxn.randomize();
    trxn.print_trxn("bfm");
    drive_trxn (trxn);
end
```

endtask // drive

```
task alu_bfm::drive_trxn (alu_trxn trxn);
  @ alu_if.cb;
  alu_if.cb.opcode <= trxn.op;
  alu_if.cb.A <= trxn.a;
  alu_if.cb.B <= trxn.b;
  trxn.out <= alu_if.cb.OUT;
endtask // drive_trxn
```

- The BFM/driver class:
  - Has a handle to a virtual interface
  - Declares a alu\_trxn data type
  - Has a constructor
  - drive() task:
    - Does not end
    - Creates a new transaction
    - Randomizes the transaction
    - Passes the handle to drive\_trxn() task
  - drive\_trxn () task
    - consumes time
    - drives the input signals based on the values in the trxn class
    - Uses clocking block and non-blocking assignments
    - Adheres to pin level timing of signals



#### Scoreboard

function alu sb::compute expected value (alu trxn trxn in); logic [7:0] A, B; logic [15:0] expected\_out; logic [2:0] opcode; A = trxn in.a;B = trxn in.b;opcode = trxn\_in.op; case (opcode) 3'b000: expected out = A+B; // ADD 3'b001: expected out = A-B; 3'b010: expected out = A\*B; // MUL 3'b100: expected out = A&B; 3'b101: expected\_out = A|B; 3'b110: expected out = A^B; 3'b111: expected out = ~A^B; // XNOR endcase // case (trxn in.op) assert (expected out == trxn in.out) else \$fatal ( 1, "op=%s, A=%h, B=%h, OUT=%h, expected\_out=%h", trxn in.op.name, A, B, trxn.out, expected out endfunction // get expected value task alu\_sb::check (); int i; trxn = new (); @ alu\_if.cb; while (1) begin @ (posedge alu if.cb); \$cast (trxn.op, alu\_if.cb.opcode); <= alu\_if.cb.A; trxn.a <= alu if.cb.B; trxn.b trxn.out <= alu if.OUT;</pre> @ (alu if.cb): compute expected value (trxn); end endtask // check

#### The Scoreboard:

- Functionality is to continuously check the output independent of the input stimulus
- check() task:
  - Collects information from the interface and populates the trxn class
  - Calls a compute\_expected\_out function
- compute\_expected\_out() task
  - Implements the model of the deisign
  - Takes in the inputs and gets an expected output
  - Compares the actual output against the expected output
  - Issues an error message if the comparison fails



#### How does the testbench look like?



#### How do we know we are done?

- With a random testbench it is difficult to know what scenarios have been exercised
- Two techniques are typically used to get a measure of what's done
- Code Coverage
  - No additional instrumentation is needed
  - Toggle, Statement, Expression, Branch coverage
- Functional Coverage
  - Requires planning
  - Requires instrumenting code
  - SystemVerilog provide constructs to support functional coverage
  - Provides detailed reports on how frequently coverage was hit with the test sample
- Coverage closure is an important aspect of verification quality



### What did we go over ...

- Built a directed and random testbench
- Discussed various components of a testbench
- Modularized and built in complexity into a testbench ... for a reason
- Demonstrated that verification and testbench development requires good planning and software skills



#### ARM Cortex A72 CPU





#### ARM CCN-512 SoC Framework

#### Heterogeneous processors - CPU, GPU, DSP and Up to 24 I/O Virtualized Interrupts Up to 4 accelerators coherent cores per interfaces for cluster GIC-500 accelerators and I/O Cortex CPU Cortex CPU 10-40 or CHI or CHI GbE PCIe DSP ) PCle SATA Cortex-A57 Cortex-A57 Cortex-A57 master master Cortex-A57 (DPI) (Crypto) USB DPI Cortex CPU Cortex CPU NIC-400 Up to 12 or CHI or CHI coherent master master Cortex-A53 I/O Virtualisation CoreLink MMU-500 Cortex-AS3 Cortex-A53 Cortex-AS3 clusters CoreLink<sup>™</sup> CCN-512 Cache Coherent Network Snoop Filter 1-32MB L3 cache Memory Memory Memory Memory Network Interconnect Network Interconnect Controller Controller Controller Controller NIC-400 NIC-400 DMC-520 DMC-520 DMC-520 DMC-520 Integrated L3 cache ×72 x72 ×72 x72 Flash ..... ( SRAM GPIO ..... PCIe DDR4-3200 DDR4-3200 DDR4-3200 DDR4-3200 Up to Quad channel Peripheral address space DDR3/4 x72

#### ARM's CCN-512 Mixed Traffic Infrastructure SoC Framework



#### What are the challenges of verifying complex systems?

- Typical processor development from scratch could be about 100s of staff years effort
- Multiple parallel developments, multiple sites and it takes a crowd to verify a processor
- The challenges are numerous especially when units are put together as a larger unit or a whole processor and verified
- Reuse of code becomes an absolute key to avoid duplication of work
- Multiple times it is essential to integrate an external IP into your system
- The IP can come with it's own verification implementation
- This requires rigorous planning, code structure, & lockstep development
- Standardization becomes a key consideration
- So how do we solve this?



### Useful pointers

- <u>https://verificationacademy.com/</u>
- SV Unit YouTube Video
- EDA Playground
- <u>http://testbench.in/</u>
- Search for SystemVerilog on YouTube

#### Let's solve this ...

| class base;                                             | program a;                                      |
|---------------------------------------------------------|-------------------------------------------------|
| int a;                                                  | initial begin                                   |
| static bit b;                                           | base b1, b2;                                    |
| function new(int val);                                  | sub s1, s2;                                     |
| a = val;                                                | othersub osl;                                   |
| endfunction                                             |                                                 |
| virtual function void say_hi();                         | bI = new(I);                                    |
| \$display(\$psprintf("hello from base (a == %0d)", a)); | sI = new(2);                                    |
| endfunction                                             | osl = new(3);                                   |
| class sub <b>extends</b> base;                          |                                                 |
| int c;                                                  | sl.say_hi();                                    |
| function new(int val);                                  | osl.say_hi();                                   |
| super.new(val);                                         |                                                 |
| c = val;                                                | b2 = osl;                                       |
| endfunction                                             | b2.say_hi();                                    |
| <pre>virtual function void say_hi();</pre>              |                                                 |
| super.say_hi();                                         | \$display(\$psprintf("b == %0d", base::b));     |
| \$display(\$psprintf("hello from sub (c == %0d)", c));  | \$display(\$psprintf("b == %0d", othersub::b)); |
| endfunction                                             | end                                             |
| endclass                                                | endprogram                                      |