# **Computer Architecture** **Lecture 01 - Introduction** Pengju Ren Institute of Artificial Intelligence and Robotics Xi'an Jiaotong University http://gr.xjtu.edu.cn/web/pengjuren ## **Course Administration** Instructor: Pengju Ren & Tian Xia TA: Siyang Wang (Ph.D Candidate) **Lectures:** Two 100-minute lectures a week Textbook: Computer Architecture: A Quantitative Approach 6<sup>th</sup> Edition(2019) 中文版(2022.9月) Prerequisite: Digital System Structure and Design ## **Preface** "The most beautiful thing we can experience is the mysterious. It is the source of all true art and Science." ---Albert Einstein, What I believe, 1930 UTU 2023 **Application** **Algorithm** **Programming Language** **Operating System/Virtual Machines** **Instruction Set Architecture (ISA)** Microarchitecture **Register-Transfer Level (RTL)** Gates Circuits **Devices** **Physics** **Application** **Algorithm** **Programming Language** **Operating System/Virtual Machines** **Instruction Set Architecture (ISA)** Microarchitecture **Register-Transfer Level (RTL)** Gates **Circuits** **Devices** **Physics** This course will start you thinking about designing and analyzing the underlying hardware computer system ru 202° | Application | |------------------------------------| | Application | | Algorithm | | Programming Language | | Operating System/Virtual Machines | | Instruction Set Architecture (ISA) | | Microarchitecture | | Register-Transfer Level (RTL) | | Gates | | Circuits | | Devices | | Physics | #### **Application Requirement:** - Suggest how to improve architecture - Provide revenue to fund development Architecture provides feedback to guide application and technology research directions #### **Technology Constraints:** - Restrict what can be done efficiently - New technologies make new arch possible # **Computing Devices Then...** EDSAC, University of Cambridge, UK, 1949 # **Computing Devices Now** # **Architecture continually changing** # Single-Thread(Sequential) Processor Performance # **Moore's Law Scaling with Cores** ## **Global Semiconductor Market** Figure 1.1.1: (a) The growth rate of revenue of semiconductors parallels those of the gross world product (GWP) for the past 20 years. After the initial fast growth period around the 1990s, worldwide semiconductor sales grow at a similar rate as the gross world product. The global semiconductor market is estimated at \$450 billion USD in revenue for 2020. Products using these semiconductors represent global revenues of \$2 trillion USD, or around 3.5% of global gross domestic product (GDP) ## **Advanced Tech nodes continue provide value** Core Area (µm²) Steady progress in two-dimensional transistor scaling and a variety of device enhancement techniques have sustained energy-efficiency improvement and device density gains from one technology generation to the next # **TSMC (2022.1.13)** ## 2021 Revenue by Platform 全年收入+24.9%,达到570亿美元(毛利53-55%, 净利42-44%【500强No.1】) HPC、IoT 和 Automotive 分别实现 34%、21% 和 51% 的强劲增长 Huawei 2021销售收入约900亿美元(净利率10%左右) # **Upheaval in Computer Design** - Most of last 50 years, Moore's Law ruled - Technology scaling allowed continual performance/energy improvements without changing software model - Last decade, technology scaling slowed/stopped - Dennard (voltage) scaling over (supply voltage ~fixed) - Moore's Law (cost/transistor) over? - No competitive replacement for CMOS anytime soon - Energy efficiency constrains everything - No "free lunch" for software developers, must consider: - Parallel systems - Heterogeneous systems # **Today's Dominant Target Systems** ### Mobile (smartphone/tablet) - >1 billion sold/year - Market dominated by ARM-ISA-compatible general-purpose processor in system-on-a-chip (SoC) - Plus sea of custom accelerators (radio, image, video, graphics, audio, motion, location, security, etc.) ## Warehouse-Scale Computers (WSCs) - 100,000's cores per warehouse - Market dominated by x86-compatible server chips - Dedicated apps, plus cloud hosting of virtual machines - Now seeing increasing use of GPUs, FPGAs, custom hardware to accelerate workloads ## Embedded computing - Wired/wireless network infrastructure, printers - Consumer TV/Music/Games/Automotive/Camera/MP3 - Internet of Things! # **Evaluation of Expressions (ASIC v.s Processor)** **App: Polynomial operation** **Application Specific Design** (High Efficiency, Dedicated) 大四课程: 人工智能芯片设计导论 **General Design** (Programable, Flexible) 本课程: 计算机体系结构 # **Course Content Computer Architecture** ## Architecture vs. Microarchitecture ## "Architecture"/Instruction Set Architecture: - Programmer visible state (Memory & Register) - Operations (Instructions and how they work) - Execution Semantics (interrupts) - Input/Output - Data Types/Sizes # Microarchitecture/Organization: - Tradeoffs on how to implement ISA for some metric (Speed, Energy, Cost) - Examples: Pipeline depth, number of pipelines, cache size, silicon area, peak power, execution ordering, bus widths, ALU widths, etc. ## Same Architecture Diff Micro-Architecture #### **AMD Phenom X4** - X86 Instruction Set - Quad Core - 125W - Decode 3 Instructions/Cycle/Core - 64KB L1 I Cache, 64KB L1 D Cache - 512KB L2 Cache - Out-of-order - 2.6GHz ### **Intel Atom** - X86 Instruction Set - Single Core - 2W - Decode 2 Instructions/Cycle/Core - 32KB L1 I Cache, 24KB L1 D Cache - 512KB L2 Cache - In-order - 1.6GHz Image Credit: Intel ## **Diff Architecture Diff Micro-Architecture** #### **AMD Phenom X4** - X86 Instruction Set - Quad Core - 125W - Decode 3 Instructions/Cycle/Core - 64KB L1 I Cache, 64KB L1 D Cache - 512KB L2 Cache - Out-of-order - 2.6GHz Image Credit: AMD #### **IBM POWER7** - Power Instruction Set - Eight Core - 200W - Decode 6 Instructions/Cycle/Core - 32KB L1 | Cache, 32KB L1 D Cache - 256KB L2 Cache - Out-of-order - 4.25GHz Image Credit: IBM Courtesy of International Business Machines Corporation, © International Business Machines Corporation. # Where do Operands come from and Where do Results Go? # Where do Operands come from and Where do Results Go? # Where do Operands come from and Where do Results Go? # **Stack-Based Instruction Set Architecture(ISA)** ## Burrough's B5000 (1960) - Burrough's B6700 - HP 3000 - ICL 2900 - Symbolics 3600 - Inmos Transputer ## Modern - Forth machines - Java Virtual Machine - Intel x87 Floating Point Unit ## Hardware Organization of the Stack Stack is part of the processor state - ⇒ stack must be bounded and small - ≈ number of Registers, not the size of main memory Conceptually stack is unbounded - ⇒a part of the stack is included in the processor state; the rest is kept in the main memory # **Stack Operations/Implicit Memory References** Suppose the top 2 elements of the stack are kept in registers and the rest is kept in the memory. ``` Each push operation \Rightarrow 1 memory reference pop operation \Rightarrow 1 memory reference ``` Better performance by keeping the top N elements in registers, and memory references are made only when register stack overflows or underflows. ### **Stack Size and Memory References** ``` abc*+adc*+e-/ stack (size = 2) memory ref. program push a R0 push b R0 R1 push c R0 R1 R2 R0 R1 R0 + push a R0 R1 push d d, ss(a+b*c) RORIR R0 R1 R2 R3 push c c, ss(a) sf(a) R0 R1 R2 sf(a+b*c) R0 R1 e,ss(a+b*c) R0 R1 R2 sf(a+b*c) R0 R1 R<sub>0</sub> ``` Four Store and Fetch ### **Stack Size and Memory References** # Where do Operands come from and Where do Results Go? #### **Classes of Instructions** - Data Transfer - LD, ST, MFC1, MTC1, MFC0, MTC0 - ALU - ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI - Control Flow - BEQZ, JR, JAL, TRAP, ERET - Floating Point - ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W, - Multimedia (SIMD) - ADD.PS, SUB.PS, MUL.PS, C.LT.PS - String - REP MOVSB (x86) ### **ISA Encoding** Fixed Width: Every Instruction has same width Easy to decode (RISC Architectures: MIPS, PowerPC, SPARC, ARM...) Ex: MIPS, every instruction 4-bytes Variable Length: Instructions can vary in width Takes less space in memory and caches (CISC Architectures: IBM 360, x86, Motorola 68k, VAX...) Ex: x86, instructions 1-byte up to 17-bytes #### **Mostly Fixed or Compressed:** - Ex: MIPS16, THUMB (only two formats 2 and 4 bytes) - PowerPC and some VLIWs (Store instructions compressed, decompress into Instruction Cache #### (Very) Long Instruction Word: - Multiple instructions in a fixed width bundle - Ex: Multiflow, HP/ST Lx, TI C6000 ## Case study: X86(IA-32) Instruction Encoding | Instruction<br>Prefixes | Opcode | ModR/M | Scale, Index,<br>Base | Displacement | Immediate | |--------------------------------------------|--------------------|-----------------------|-----------------------|----------------------|----------------------| | Up to four<br>Prefixes<br>(1 byte<br>each) | 1,2, or 3<br>bytes | 1 byte<br>(if needed) | 1 byte<br>(if needed) | 0,1,2, or 4<br>bytes | 0,1,2, or 4<br>bytes | x86 and x86-64 Instruction Formats Possible instructions 1 to 18 bytes long ### **RISC-V Instruction Encoding(1)** | 31 30 25 | 24 21 20 | 19 15 | 14 12 | 11 8 7 | 6 | 0 | |--------------------|----------|--------|--------|----------------|--------|----------| | funct7 | rs2 | rs1 | funct3 | rd | opcode | R-type | | <u>imm</u> [11 | L:0] | rs1 | funct3 | rd | opcode | l-type | | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode | S-type | | imm[12 10:5] | rs2 | rs1 | funct3 | imm [4:1 11] | opcode | B-type | | | imm[3 | 31:12] | 1 | rd | opcode | U-type | | F20110. | 111111 | F1 | 10(101 | | | <b>7</b> | | <u>imm</u> [20 10: | TITT]] | | 19:12] | rd | opcode | J-type | - R-Format: instructions using 3 register inputs add, xor, mul —arithmetic/logical ops - I-Format: instructions with immediates, loads addi, lw, jalr, slli - S-Format: store instructions: sw, sb - SB-Format: branch instructions: beq, bge - U-Format: instructions with upper immediates - UJ-Format: jump instructions: jal ## **RISC-V Instruction Encoding(2)** ## New open-source, license-free ISA spec - Supported by growing shared software ecosystem - Appropriate for all levels of computing system, from microcontrollers to supercomputers - 32-bit, 64-bit, and 128-bit variants (we're using 32-bit in class, textbook uses 64-bit) #### **Real World Instruction Sets** | Arch | Туре | # Oper | # Mem | Data Size | # Regs | Addr Size | Use | |-----------|---------|--------|-------|--------------------|--------|-----------|--------------------------| | Alpha | Reg-Reg | 3 | 0 | 64-bit | 32 | 64-bit | Workstation | | ARM | Reg-Reg | 3 | 0 | 32/64-bit | 16 | 32/64-bit | Cell Phones,<br>Embedded | | MIPS | Reg-Reg | 3 | 0 | 32/64-bit | 32 | 32/64-bit | Workstation,<br>Embedded | | SPARC | Reg-Reg | 3 | 0 05 | 32/64-bit | 24-32 | 32/64-bit | Workstation | | TI C6000 | Reg-Reg | 3 | 0 | 32-bit | 32 | 32-bit | DSP | | IBM 360 | Reg-Mem | 50) | 1 | 32-bit | 16 | 24/31/64 | Mainframe | | x86 | Reg-Mem | 2 | 1 | 8/16/32/<br>64-bit | 4/8/24 | 16/32/64 | Personal<br>Computers | | VAX | Mem-Mem | 3 | 3 | 32-bit | 16 | 32-bit | Minicomputer | | Mot. 6800 | Accum. | 1 | 1/2 | 8-bit | 0 | 16-bit | Microcontroler | ## Why the Diversity in ISAs? #### **Application Influenced ISA** - Instructions for Applications - DSP instructions - -11/2025 • Compiler Technology has improved - SPARC Register Windows no longer needed - Compiler can register allocate effectively #### **Technology Influenced ISA** - Storage is expensive, tight encoding important - Reduced Instruction Set Computer - Remove instructions until whole computer fits on die - Multicore/Manycore - Transistors not turning into sequential performance #### Recap Application Algorithm **Programming Language** **Operating System/Virtual Machines** **Instruction Set Architecture (ISA)** Microarchitecture **Register-Transfer Level (RTL)** Gates **Circuits** **Devices** **Physics** # ISA vs Micro-Architecture ISA Characteristics - Machine Models - Encoding - Data Types - Instructions - Addressing Modes #### And in conclusion ... - Computer Architecture >> ISAs and RTL - Computer Architecture is about interaction of hardware and software, and design of appropriate abstraction layers - Computer architecture is shaped by technology and applications - Computer Science at the crossroads from sequential to parallel computing - Salvation requires innovation in many fields, including computer architecture - Read Chapter 1 & Appendix A for next time! (6<sup>th</sup>) Next Lecture: RISC-V ISA, Datapath & Control (ISA and Micro-Architecture)