There are situations, called hazards, that prevent the next instruction in the instruction stream from being executing during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining.
There are three classes of hazards:
A hazard causes pipeline bubbles to be inserted.The following table shows how the stalls are actually implemented. As a result, no new instructions are fetched during clock cycle 4, no instruction will finish during clock cycle 8.A cache miss. A cache miss stalls all the instructions on pipeline both before and after the instruction causing the miss.
A hazard in pipeline. Eliminating a hazard often requires that some instructions in the pipeline to be allowed to proceed while others are delayed. When the instruction is stalled, all the instructions issued later than the stalled instruction are also stalled. Instructions issued earlier than the stalled instruction must continue, since otherwise the hazard will never clear.
Clock cycle number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Instr | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Instr i | IF | ID | EX | MEM | WB | |||||
Instr i+1 | IF | ID | EX | MEM | WB | |||||
Instr i+2 | IF | ID | EX | MEM | WB | |||||
Stall | bubble | bubble | bubble | bubble | bubble | |||||
Instr i+3 | IF | ID | EX | MEM | WB | |||||
Instr i+4 | IF | ID | EX | MEM | WB |
Clock cycle number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Instr | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Instr i | IF | ID | EX | MEM | WB | |||||
Instr i+1 | IF | ID | EX | MEM | WB | |||||
Instr i+2 | IF | ID | EX | MEM | WB | |||||
Instr i+3 | stall | IF | ID | EX | MEM | WB | ||||
Instr i+4 | IF | ID | EX | MEM | WB |
In case of data hazards:
Clock cycle number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Instr | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Instr i | IF | ID | EX | MEM | WB | |||||
Instr i+1 | IF | ID | bubble | EX | MEM | WB | ||||
Instr i+2 | IF | bubble | ID | EX | MEM | WB | ||||
Instr i+3 | bubble | IF | ID | EX | MEM | WB | ||||
Instr i+4 | IF | ID | EX | MEM | WB |
Clock cycle number | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Instr | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Instr i | IF | ID | EX | MEM | WB | |||||
Instr i+1 | IF | ID | stall | EX | MEM | WB | ||||
Instr i+2 | IF | stall | ID | EX | MEM | WB | ||||
Instr i+3 | stall | IF | ID | EX | MEM | WB | ||||
Instr i+4 | IF | ID | EX | MEM | WB |
The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is
If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and
If all instructions take the same number of cycles, which must also equal the number of pipeline stages ( the depth of the pipeline) then unpipelined CPI is equal to the depth of the pipeline, leading to
If there are no pipeline stalls, this leads to the intuitive result that pipelining can improve performance by the depth of pipeline.