🗊Презентация Computer structure pipeline

Категория: Технология
Нажмите для полного просмотра!
Computer structure pipeline, слайд №1Computer structure pipeline, слайд №2Computer structure pipeline, слайд №3Computer structure pipeline, слайд №4Computer structure pipeline, слайд №5Computer structure pipeline, слайд №6Computer structure pipeline, слайд №7Computer structure pipeline, слайд №8Computer structure pipeline, слайд №9Computer structure pipeline, слайд №10Computer structure pipeline, слайд №11Computer structure pipeline, слайд №12Computer structure pipeline, слайд №13Computer structure pipeline, слайд №14Computer structure pipeline, слайд №15Computer structure pipeline, слайд №16Computer structure pipeline, слайд №17Computer structure pipeline, слайд №18Computer structure pipeline, слайд №19Computer structure pipeline, слайд №20Computer structure pipeline, слайд №21Computer structure pipeline, слайд №22Computer structure pipeline, слайд №23Computer structure pipeline, слайд №24Computer structure pipeline, слайд №25Computer structure pipeline, слайд №26Computer structure pipeline, слайд №27Computer structure pipeline, слайд №28Computer structure pipeline, слайд №29Computer structure pipeline, слайд №30Computer structure pipeline, слайд №31Computer structure pipeline, слайд №32Computer structure pipeline, слайд №33Computer structure pipeline, слайд №34Computer structure pipeline, слайд №35Computer structure pipeline, слайд №36Computer structure pipeline, слайд №37Computer structure pipeline, слайд №38Computer structure pipeline, слайд №39Computer structure pipeline, слайд №40Computer structure pipeline, слайд №41Computer structure pipeline, слайд №42Computer structure pipeline, слайд №43Computer structure pipeline, слайд №44Computer structure pipeline, слайд №45Computer structure pipeline, слайд №46Computer structure pipeline, слайд №47Computer structure pipeline, слайд №48Computer structure pipeline, слайд №49Computer structure pipeline, слайд №50Computer structure pipeline, слайд №51Computer structure pipeline, слайд №52Computer structure pipeline, слайд №53Computer structure pipeline, слайд №54Computer structure pipeline, слайд №55Computer structure pipeline, слайд №56Computer structure pipeline, слайд №57Computer structure pipeline, слайд №58Computer structure pipeline, слайд №59Computer structure pipeline, слайд №60Computer structure pipeline, слайд №61Computer structure pipeline, слайд №62Computer structure pipeline, слайд №63Computer structure pipeline, слайд №64

Содержание

Вы можете ознакомиться и скачать презентацию на тему Computer structure pipeline. Доклад-сообщение содержит 64 слайдов. Презентации для любого класса можно скачать бесплатно. Если материал и наш сайт презентаций Mypresentation Вам понравились – поделитесь им с друзьями с помощью социальных кнопок и добавьте в закладки в своем браузере.

Слайды и текст этой презентации


Слайд 1





Computer Structure
 
Pipeline
Lecturer: 
Aharon Kupershtok
Описание слайда:
Computer Structure Pipeline Lecturer: Aharon Kupershtok

Слайд 2





A Basic Processor
Описание слайда:
A Basic Processor

Слайд 3





Pipelined Car Assembly
Описание слайда:
Pipelined Car Assembly

Слайд 4


Computer structure pipeline, слайд №4
Описание слайда:

Слайд 5





Pipelining
Pipelining does not reduce the latency of single task, 
it increases the throughput of entire workload
Potential speedup = Number of pipe stages
Pipeline rate is limited by the slowest pipeline stage
 Partition the pipe to many pipe stages
 Make the longest pipe stage to be as short as possible 
 Balance the work in the pipe stages
Pipeline adds overhead (e.g., latches)
Time to “fill” pipeline and time to “drain” it reduces speedup
Stall for dependencies
 Too many pipe-stages start to loose performance
IPC of an ideal pipelined machine is 1
Every clock one instruction finishes
Описание слайда:
Pipelining Pipelining does not reduce the latency of single task, it increases the throughput of entire workload Potential speedup = Number of pipe stages Pipeline rate is limited by the slowest pipeline stage Partition the pipe to many pipe stages Make the longest pipe stage to be as short as possible Balance the work in the pipe stages Pipeline adds overhead (e.g., latches) Time to “fill” pipeline and time to “drain” it reduces speedup Stall for dependencies Too many pipe-stages start to loose performance IPC of an ideal pipelined machine is 1 Every clock one instruction finishes

Слайд 6





Pipelined CPU
Описание слайда:
Pipelined CPU

Слайд 7





Structural Hazard
Different instructions using the same resource at the same time
Register File:
Accessed in 2 stages:
Read during stage 2 (ID)
Write during stage 5 (WB)
Solution: 2 read ports, 1 write port
Memory
Accessed in 2 stages: 
Instruction Fetch during stage 1 (IF)
Data read/write during stage 4 (MEM)
Solution: separate instruction cache and data cache
Each functional unit can only be used once per instruction
Each functional unit must be used at the same stage for all instructions
Описание слайда:
Structural Hazard Different instructions using the same resource at the same time Register File: Accessed in 2 stages: Read during stage 2 (ID) Write during stage 5 (WB) Solution: 2 read ports, 1 write port Memory Accessed in 2 stages: Instruction Fetch during stage 1 (IF) Data read/write during stage 4 (MEM) Solution: separate instruction cache and data cache Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions

Слайд 8





Pipeline Example: cycle 1
Описание слайда:
Pipeline Example: cycle 1

Слайд 9





Pipeline Example: cycle 2
Описание слайда:
Pipeline Example: cycle 2

Слайд 10





Pipeline Example: cycle 3
Описание слайда:
Pipeline Example: cycle 3

Слайд 11





Pipeline Example: cycle 4
Описание слайда:
Pipeline Example: cycle 4

Слайд 12





Pipeline Example: cycle 5
Описание слайда:
Pipeline Example: cycle 5

Слайд 13





RAW Dependency
Описание слайда:
RAW Dependency

Слайд 14





Using Bypass to Solve RAW Dependency
Описание слайда:
Using Bypass to Solve RAW Dependency

Слайд 15





RAW Dependency
Описание слайда:
RAW Dependency

Слайд 16





Forwarding Hardware
Описание слайда:
Forwarding Hardware

Слайд 17





Forwarding Control
 Forwarding from EXE (L3)
if (L3.RegWrite and (L3.dst == L2.src1)) ALUSelA = 1
if (L3.RegWrite and (L3.dst == L2.src2)) ALUSelB = 1
 Forwarding from MEM (L4)
if (L4.RegWrite and 
      ((not L3.RegWrite) or (L3.dst  L2.src1)) and 
      (L4.dst = L2.src1)) ALUSelA = 2
if (L4.RegWrite and 
      ((not L3.RegWrite) or (L3.dst  L2.src2)) and
      (L4.dst = L2.src2)) ALUSelB = 2
Описание слайда:
Forwarding Control Forwarding from EXE (L3) if (L3.RegWrite and (L3.dst == L2.src1)) ALUSelA = 1 if (L3.RegWrite and (L3.dst == L2.src2)) ALUSelB = 1 Forwarding from MEM (L4) if (L4.RegWrite and ((not L3.RegWrite) or (L3.dst  L2.src1)) and (L4.dst = L2.src1)) ALUSelA = 2 if (L4.RegWrite and ((not L3.RegWrite) or (L3.dst  L2.src2)) and (L4.dst = L2.src2)) ALUSelB = 2

Слайд 18





Register File Split
Register file is written during first half of the cycle
Register file is read during second half of the cycle
 Register file is written before it is read  returns the correct data
Описание слайда:
Register File Split Register file is written during first half of the cycle Register file is read during second half of the cycle Register file is written before it is read  returns the correct data

Слайд 19





Can't Always Forward
Описание слайда:
Can't Always Forward

Слайд 20





Stall If Cannot Forward
Описание слайда:
Stall If Cannot Forward

Слайд 21





Software Scheduling to Avoid Load Hazards
Fast code
		LW 	Rb,b
		LW 	Rc,c
		LW 	Re,e 
		ADD 	Ra,Rb,Rc
		LW 	Rf,f
		SW  	a,Ra 
		SUB 	Rd,Re,Rf
		SW	d,Rd
Описание слайда:
Software Scheduling to Avoid Load Hazards Fast code LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

Слайд 22





Control Hazards
Описание слайда:
Control Hazards

Слайд 23





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 24





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 25





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 26





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 27





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 28





Control Hazard on Branches
Описание слайда:
Control Hazard on Branches

Слайд 29





Control Hazard: Stall
Stall pipe when branch is encountered until resolved

Stall impact: assumptions
CPI = 1 
20% of instructions are branches
Stall 3 cycles on every branch 
   		  CPI new = 1 + 0.2 × 3 = 1.6
               
         (CPI new = CPI Ideal + avg. stall cycles / instr.)
	      We loose 60% of the performance
Описание слайда:
Control Hazard: Stall Stall pipe when branch is encountered until resolved Stall impact: assumptions CPI = 1 20% of instructions are branches Stall 3 cycles on every branch  CPI new = 1 + 0.2 × 3 = 1.6 (CPI new = CPI Ideal + avg. stall cycles / instr.) We loose 60% of the performance

Слайд 30





Control Hazard: Predict Not Taken
Execute instructions from the fall-through (not-taken) path
As if there is no branch
If the branch is not-taken (~50%), no penalty is paid

If branch actually taken
Flush the fall-through path instructions before they change the machine state (memory / registers)
Fetch the instructions from the correct (taken) path

Assuming ~50% branches not taken on average
	          CPI new = 1 + (0.2 × 0.5) × 3 = 1.3
Описание слайда:
Control Hazard: Predict Not Taken Execute instructions from the fall-through (not-taken) path As if there is no branch If the branch is not-taken (~50%), no penalty is paid If branch actually taken Flush the fall-through path instructions before they change the machine state (memory / registers) Fetch the instructions from the correct (taken) path Assuming ~50% branches not taken on average CPI new = 1 + (0.2 × 0.5) × 3 = 1.3

Слайд 31





Dynamic Branch Prediction
Описание слайда:
Dynamic Branch Prediction

Слайд 32





BTB
Allocation
Allocate instructions identified as branches (after decode)
Both conditional and unconditional branches are allocated
Not taken branches need not be allocated
BTB miss implicitly predicts not-taken
Prediction
BTB lookup is done parallel to IC lookup
BTB provides
Indication that the instruction is a branch (BTB hits)
Branch predicted target
Branch predicted direction
Branch predicted type (e.g., conditional, unconditional)
Update (when branch outcome is known)
Branch target
Branch history (taken / not-taken)
Описание слайда:
BTB Allocation Allocate instructions identified as branches (after decode) Both conditional and unconditional branches are allocated Not taken branches need not be allocated BTB miss implicitly predicts not-taken Prediction BTB lookup is done parallel to IC lookup BTB provides Indication that the instruction is a branch (BTB hits) Branch predicted target Branch predicted direction Branch predicted type (e.g., conditional, unconditional) Update (when branch outcome is known) Branch target Branch history (taken / not-taken)

Слайд 33





BTB (cont.)
Wrong prediction
Predict not-taken, actual taken
Predict taken, actual not-taken, or actual taken but wrong target

In case of wrong prediction – flush the pipeline
Reset latches (same as making all instructions to be NOPs)
Select the PC source to be from the correct path
Need get the fall-through with the branch
Start fetching instruction from correct path

Assuming P% correct prediction rate
	          CPI new = 1 + (0.2 × (1-P)) × 3
For example, if P=0.7
		CPI new = 1 + (0.2 × 0.3) × 3 = 1.18
Описание слайда:
BTB (cont.) Wrong prediction Predict not-taken, actual taken Predict taken, actual not-taken, or actual taken but wrong target In case of wrong prediction – flush the pipeline Reset latches (same as making all instructions to be NOPs) Select the PC source to be from the correct path Need get the fall-through with the branch Start fetching instruction from correct path Assuming P% correct prediction rate CPI new = 1 + (0.2 × (1-P)) × 3 For example, if P=0.7 CPI new = 1 + (0.2 × 0.3) × 3 = 1.18

Слайд 34





Adding a BTB to the Pipeline
Описание слайда:
Adding a BTB to the Pipeline

Слайд 35





Adding a BTB to the Pipeline
Описание слайда:
Adding a BTB to the Pipeline

Слайд 36





Adding a BTB to the Pipeline
Описание слайда:
Adding a BTB to the Pipeline

Слайд 37





Using The BTB
Описание слайда:
Using The BTB

Слайд 38





Using The BTB (cont.)
Описание слайда:
Using The BTB (cont.)

Слайд 39





Backup
Описание слайда:
Backup

Слайд 40





MIPS Instruction Formats
Описание слайда:
MIPS Instruction Formats

Слайд 41





The Memory Space
Each memory location 
is 8 bit = 1 byte wide
has an address
We assume 32 byte address 
An address space of 232 bytes
Memory stores both instructions and data
Each instruction is 32 bit wide  stored in 4 consecutive bytes in memory
Various data types have different width
Описание слайда:
The Memory Space Each memory location is 8 bit = 1 byte wide has an address We assume 32 byte address An address space of 232 bytes Memory stores both instructions and data Each instruction is 32 bit wide  stored in 4 consecutive bytes in memory Various data types have different width

Слайд 42





Register File
The Register File holds 32 registers
Each register is 32 bit wide
The RF supports parallel
reading any two registers and 
writing any register
Inputs
Read reg 1/2: #register whose value will be output on Read data 1/2
RegWrite: write enable
Описание слайда:
Register File The Register File holds 32 registers Each register is 32 bit wide The RF supports parallel reading any two registers and writing any register Inputs Read reg 1/2: #register whose value will be output on Read data 1/2 RegWrite: write enable

Слайд 43





Memory Components
Inputs
Address: address of the memory location we wish to access
Read: read data from location
Write: write data into location
Write data (relevant when Write=1) data to be written into specified location
Outputs
Read data (relevant when Read=1) data read from specified location
Описание слайда:
Memory Components Inputs Address: address of the memory location we wish to access Read: read data from location Write: write data into location Write data (relevant when Write=1) data to be written into specified location Outputs Read data (relevant when Read=1) data read from specified location

Слайд 44





The Program Counter (PC)
Holds the address (in memory) of the next instruction to be executed
After each instruction, advanced to point to the next instruction
If the current instruction is not a taken branch, 
    the next instruction resides right after the current instruction 
    PC  PC + 4
If the current instruction is a taken branch, 
    the next instruction resides at the branch target 
    PC  target                      (absolute jump)
    PC  PC + 4 + offset×4   (relative jump)
Описание слайда:
The Program Counter (PC) Holds the address (in memory) of the next instruction to be executed After each instruction, advanced to point to the next instruction If the current instruction is not a taken branch, the next instruction resides right after the current instruction PC  PC + 4 If the current instruction is a taken branch, the next instruction resides at the branch target PC  target (absolute jump) PC  PC + 4 + offset×4 (relative jump)

Слайд 45





Instruction Execution Stages
Fetch
Fetch instruction pointed by PC from I-Cache
Decode
Decode instruction (generate control signals)
Fetch operands from register file
Execute
For a memory access: calculate effective address
For an ALU operation: execute operation in ALU
For a branch: calculate condition and target
Memory Access
For load: read data from memory
For store: write data into memory
Write Back
Write result back to register file
update program counter
Описание слайда:
Instruction Execution Stages Fetch Fetch instruction pointed by PC from I-Cache Decode Decode instruction (generate control signals) Fetch operands from register file Execute For a memory access: calculate effective address For an ALU operation: execute operation in ALU For a branch: calculate condition and target Memory Access For load: read data from memory For store: write data into memory Write Back Write result back to register file update program counter

Слайд 46





The MIPS CPU
Описание слайда:
The MIPS CPU

Слайд 47





Executing an Add Instruction
Описание слайда:
Executing an Add Instruction

Слайд 48





Executing a Load Instruction
Описание слайда:
Executing a Load Instruction

Слайд 49





Executing a Store Instruction
Описание слайда:
Executing a Store Instruction

Слайд 50





Executing a BEQ Instruction
Описание слайда:
Executing a BEQ Instruction

Слайд 51





Control Signals
Описание слайда:
Control Signals

Слайд 52





Pipelined CPU: Load (cycle 1 – Fetch)
Описание слайда:
Pipelined CPU: Load (cycle 1 – Fetch)

Слайд 53





Pipelined CPU: Load (cycle 2 – Dec)
Описание слайда:
Pipelined CPU: Load (cycle 2 – Dec)

Слайд 54





Pipelined CPU: Load (cycle 3 – Exe)
Описание слайда:
Pipelined CPU: Load (cycle 3 – Exe)

Слайд 55





Pipelined CPU: Load (cycle 4 – Mem)
Описание слайда:
Pipelined CPU: Load (cycle 4 – Mem)

Слайд 56





Pipelined CPU: Load (cycle 5 – WB)
Описание слайда:
Pipelined CPU: Load (cycle 5 – WB)

Слайд 57





Datapath with Control
Описание слайда:
Datapath with Control

Слайд 58





Multi-Cycle Control
Описание слайда:
Multi-Cycle Control

Слайд 59





Five Execution Steps
Instruction Fetch
Use PC to get instruction and put it in the Instruction Register.
Increment the PC by 4 and put the result back in the PC.
	IR = Memory[PC];
PC = PC + 4;
Instruction Decode and Register Fetch
Read registers rs and rt
Compute the branch address
	     A = Reg[IR[25-21]];
	 B = Reg[IR[20-16]];
	 ALUOut = PC + (sign-extend(IR[15-0]) << 2);
We aren't setting any control lines based on the instruction type 
	(we are busy "decoding" it in our control logic)
Описание слайда:
Five Execution Steps Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. IR = Memory[PC]; PC = PC + 4; Instruction Decode and Register Fetch Read registers rs and rt Compute the branch address A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

Слайд 60





Five Execution Steps (cont.)
Execution
   ALU is performing one of three functions, based on instruction type:
Memory Reference: effective address calculation.
	ALUOut = A + sign-extend(IR[15-0]);
R-type:
	ALUOut = A op B;
Branch:
	if (A==B) PC = ALUOut;
Memory Access or R-type instruction completion

Write-back step
Описание слайда:
Five Execution Steps (cont.) Execution ALU is performing one of three functions, based on instruction type: Memory Reference: effective address calculation. ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; Memory Access or R-type instruction completion Write-back step

Слайд 61





The Store Instruction
Описание слайда:
The Store Instruction

Слайд 62





RAW Hazard: SW Solution
Описание слайда:
RAW Hazard: SW Solution

Слайд 63





Delayed Branch
Define branch to take place AFTER n following instruction
HW executes n instructions following the branch regardless of branch is taken or not
SW puts in the n slots following the branch instructions that need to be executed regardless of branch resolution
Instructions that are before the branch instruction, or
Instructions from the converged path after the branch
If cannot find independent instructions, put NOP
Описание слайда:
Delayed Branch Define branch to take place AFTER n following instruction HW executes n instructions following the branch regardless of branch is taken or not SW puts in the n slots following the branch instructions that need to be executed regardless of branch resolution Instructions that are before the branch instruction, or Instructions from the converged path after the branch If cannot find independent instructions, put NOP

Слайд 64





Delayed Branch Performance
Filling 1 delay slot is easy, 2 is hard, 3 is harder
Assuming we can effectively fill d% of the delayed slots
			CPInew = 1 + 0.2 × (3 × (1-d))
For example, for d=0.5, we get CPInew = 1.3
Mixing architecture with micro-arch
New generations requires more delay slots
Cause computability issues between generations
Описание слайда:
Delayed Branch Performance Filling 1 delay slot is easy, 2 is hard, 3 is harder Assuming we can effectively fill d% of the delayed slots CPInew = 1 + 0.2 × (3 × (1-d)) For example, for d=0.5, we get CPInew = 1.3 Mixing architecture with micro-arch New generations requires more delay slots Cause computability issues between generations



Похожие презентации
Mypresentation.ru
Загрузить презентацию