# Ultra Low-Power VLSI with Fine Grain Runtime Power Gating

## Hiroshi Nakamura

<u>nakamura@hal.rcast.u-tokyo.ac.jp</u> (nakamura@acm.org, hiroshi@computer.org)

The University of Tokyo



IFIP-57th 2010/1/25 (H.Nakamura)

# Outline

- tradeoff between performance, power and dependability
- Outline
  - Background
  - Introduction of our Research Project
    - purpose: leakage reduction
    - implementation: fine-grain power-gating processor



# CO<sub>2</sub> Emission from IT Equipments in Japan

Report from METI (Ministry of Economy, Trade, and Industry)



# Relationships of Performance and Power

- For a transistor, Switching Delay  $\propto \frac{CV_{DD}}{(V_{DD} - V_{th})^{\alpha}}$ Dynamic Power  $P_{dvn} = C V_{DD}^2 f \beta$
- For any complete system not simple because
  - Performance is limited by a bottleneck
    Power is summation of the whole system
  - Low power and slow operation for unhurried / idle parts
    → Low power consumption without performance penalty
  - DVFS (Voltage/Frequency Scaling) for dynamic power

■ What is effective for leakage power ? → Power Gating THE UNIVERSITY OF TOKYO IFIP-57th 2010/1/25 (H.Nakamura)

Power

1/delay

(performance)

Leakage

## **Power-Gating**

#### Power-gating (PG) technique

- Inserting sleep transistors between GND and logic blocks (or between Vdd and logic blocks)
  - Cut power-supply to logic blocks
  - Active/sleep mode controlled by sleep signals
- Sleep transistors : High-Vth transistors (slow but non-leaky)





# Introduction of Our Research Project

- Innovative Power Control for Ultra Low-Power and High-Performance System LSIs
  - □ 5 years project started from October, 2006
  - Supported by JST (Japan Science and Technology Agency)
    CREST (Core Research for Evolutional Science and Technology)
- JST CREST Research Area : (approximately 30)
  - Dependable Operating Systems for Embedded Systems Aiming at Practical Applications *directed by* Dr. Mario Tokoro
  - Fundamental Technologies for Dependable VLSI System
  - Technology Innovation and Integration for Information Systems with Ultra Low Power *directed by* Prof. Takashi Nanya
    - 12 projects: my project is here

e University of Tokyo

IFIP-57th 2010/1/25 (H.Nakamura)

### Geyser: Low Power Processor through Fine-Grain Run-Time Power-Gating

- Target: Leakage Power
- Background: Leakage reduction techniques so far,
  - Standby time:
    - power-gating(Coarse Grain)
  - □ Runtime:
    - Cache-decay, Drowsy-cache, (Coarse Grain in temporal)
- Leakage for logic parts (ALU, multiplier, etc.) gets serious
  - Fast but Leaky transistors are used
  - Active ratio of those parts are not necessarily high, but active parts change frequently, that is, cycle by cycle

Objective : Reduce runtime leakage power of logic parts Challenge: how to realize fine-grain power gating



# Instruction Pipeline with Power-Gating

- Geyser: MIPS compatible processor with 5-stage pipeline,
- Straightforward PG (power-gating)
  - □ Turn EX-units into active mode only if necessary
  - □ Ex-unit gets active when an affecting instruction enters the IF stage
  - □ The activated EX-unit returns to sleep mode after execution



# **Energy Overhead of Run-Time Power-Gating**



- Sleep period should be longer than BET
  - □ Otherwise, total energy consumption increases
  - BET tells the smallest granularity for Power Gating

# Break Even Time of Each Unit



- BET is shortened when the chip temperature climbs up
  - □ Leakage current depends on temperature heavily
- We need some novel PG strategies that take BET into account

# **Power Gating Strategy**

- Requirement: Power off Ex-units longer than BET
- PG Strategies

- Straightforward: EX-units are usually in sleep mode. Turned into active mode only when being used
- Cache-miss: Ex-units are usually in active mode.
  Turned into sleep mode only in cache miss situation
- Qualitative Comparison:
  - Chance of Power-off: Straightforward > Cache-miss
  - Average duration of Power-off: Straightforward < Cache-miss, for frequently used Ex-units</p>
- Strategy for each Ex-unit is controlled by OS
  - Status registers in Control Processor (CP0), modifid by OS
  - Strategy is under control and changed during execution

Geyser-1: 2.1 mm Geyser-1: Prototype Chip 4.2 mm □ 65nm CMOS (Fujitsu e-shuttle) through VDEC (VLSI Design and Education) without cache  $\Box$  successfully in operation (50MHz) 



THE UNIVERSITY OF TOKYO





# **Preliminary Evaluation & Summary**

Ikebuchi et. al. ASSCC '09

### Real Measurement of Geyser-1 (djkstra)



- Microprocessor with cycle by cycle power gating: successfully in operation
  - □ the first implementation in the world

THE UNIVERSITY OF TOKYO

- Power reduction ratio increases for higher temperature as expected
- performance and power : easy to demonstrate

IFIP-57th 2010/1/25 (H.Nakamura)