Question 5:

- June 02, 2024

a. Consider a machine which supports the following two instruction schedules for R class and I class instructions. Assume an instruction mix of

70% R class and 30% I class instructions. Assume that IF steps take 30 nano seconds, MEM steps of instruction execution require 50 nanoseconds and the other steps require 40 nanoseconds 0 1 2 3 4 R Class IF ID EX WB I Class IF ID EX MEM WB For a multi-cycle implementation, i. What is the minimum clock cycle time? ii. How long does sit take to execute 200 instructions in nanoseconds? b. Given a deeply pipelined processor and a branch-target buffer for conditional branches only, assuming a misprediction penalty of 5 cycles and a buffer miss penalty of 4 cycles, 95% hit rate and 90% accuracy, and 20% branch frequency. How much faster is the processor with the BTB vs. a processor that has a fixed 4-cycle branch penalty?

ANSWER:

Part (a):

i. Minimum Clock Cycle Time:

The minimum clock cycle time is determined by the longest step in the instruction execution path, as the clock cycle must be long enough to accommodate the longest single step.

IF (Instruction Fetch): 30 nanoseconds
ID (Instruction Decode): 40 nanoseconds
EX (Execute): 40 nanoseconds
MEM (Memory Access): 50 nanoseconds
WB (Write Back): 40 nanoseconds

Since the MEM step requires the most time (50 nanoseconds), the minimum clock cycle time is 50 nanoseconds.

Minimum Clock Cycle Time = 50 nanoseconds

ii. Time to Execute 200 Instructions:

First, let's find the average number of cycles required per instruction given the instruction mix.

R Class Instruction: 4 cycles
I Class Instruction: 5 cycles

Given the instruction mix:

70% R class instructions: $0.7 \times 4 = 2.8$ cycles
30% I class instructions: $0.3 \times 5 = 1.5$ cycles

The average number of cycles per instruction is: $Average Cycles per Instruction = 2.8 + 1.5 = 4.3$

To execute 200 instructions: $Total Cycles = 200 \times 4.3 = 860$

Since each cycle takes 50 nanoseconds: $Total Execution Time = 860 \times 50 nanoseconds = 43, 000 nanoseconds$

Total Execution Time for 200 instructions = 43,000 nanoseconds

Part (b):

To compare the performance of the processor with and without the branch-target buffer (BTB), we need to calculate the average number of cycles per branch instruction in both scenarios and determine the speedup.

Without BTB:

Fixed 4-cycle branch penalty
Branch frequency: 20%

The average penalty per instruction is: $0.2 \times 4 = 0.8 cycles$

With BTB:

95% hit rate
90% accuracy
5% miss rate (1 - 0.95)
Branch frequency: 20%
Misprediction penalty: 5 cycles
Buffer miss penalty: 4 cycles

Hit but mispredicted (10% of hits): $0.2 \times 0.95 \times 0.10 \times 5 = 0.095 cycles$

Hit and correctly predicted (90% of hits): $0.2 \times 0.95 \times 0.90 \times 0 = 0 cycles$

Miss penalty (5% of all branches): $0.2 \times 0.05 \times 4 = 0.04 cycles$

Total penalty with BTB: $0.095 + 0.04 = 0.135 cycles$

Speedup Calculation:

The speedup can be calculated using the ratio of the average penalty per instruction without BTB to the average penalty with BTB.

$Speedup = \frac{0.8}{0.135} \approx 5.93$

So, the processor with the BTB is approximately 5.93 times faster than the processor with a fixed 4-cycle branch penalty.

Search This Blog

Milan Bakotra