ISSN: 1682-3915

© Medwell Journals, 2014

## Power Optimization Technique for VLSI Circuits Using Fine Grain Clock Controller at Leaf Nodes

<sup>1</sup>V. Vijayakumari and <sup>2</sup>T. Joby Titus
<sup>1</sup>Sri Krishna College of Technology, Kovaipudur-641042, India
<sup>2</sup>Department of Electronics and Communication Engineering,
Sri Ramakrishna Institute of Technology, Coimbatore-641010, India

Abstract: Due to dramatic increase in portable and battery operated applications; low power consumption has become the necessity in order to prolong battery life. In Very Large Scale Integration (VLSI) architecture, power consumption is an important criterion that determines the cost effectiveness of end product size. Field Programmable Gate Arrays (FPGAs) are widely used VLSI circuits that can contain even complex system on single chip. Despite their design cost advantage FPGAs impose large dynamic and static power consumption overheads. In this study various design techniques for low power optimization are surveyed. Among them the effective way to reduce dynamic power consumption is observed when introducing fine grain clock gating techniques at leaf node of FPGA architecture. By using the method dynamic power consumption has been reduced 30% at each leaf node of clock network.

Key words: Clock gating, fine grain architecture, optical interconnects, VLSI, chip

#### INTRODUCTION

FPGAs are widely used to implement special purpose processors. They impose large dynamic and static power consumption. The clock system which consists of clock distribution networks and sequential elements is one of the most power consuming components in FPGA devices. The most well known techniques to reduce clock network power is clock gating. In FPGAs customized clock networks can be implemented using programmable interconnects to reduce dynamic power consumption. Circulation clock network can reduce dynamic power consumption of registers and gates but dynamic power consumption of clock networks cannot be reduced (Ishihara et al., 2011). Different methods to manage skew and skew variations within tree and non-tree clock distribution networks, metrics to determine the most power efficient technique for a given circuit are reviewed (Vaisband et al., 2011). By reducing the power consumption of flip flops the total power consumption will be reduced. The traditional approach to clock-gating optimization used in ASIC design, the design of a fully CMOS compatible optical clock distribution and recovery system in a 3.3 V, 0.35 m CMOS process are detailed (Rivoallon, 2013; Thangaraj et al., 2010). As technology scales implementing conventional clock distribution

networks that meet low power and skew requirement is becoming more difficult. Electrical power amplification is more efficient that optical power amplification and multilevel current-mode interconnect is analyzed and compared with two reference voltage mode interconnects (Ranganathan and Jouppi, 2007; Venkatraman and Burleson, 2005). Fine grain clock interconnections are being proposed as an alternative and compared with optical interconnections as the number of optical data interconnections increases, the technical challenges of providing an efficient realization of optical data interconnections also increase. Using a single optical source the technical problem reduces largely to splitting of optical clock beam in to a multiplicity of optical clock beams and distribution of individual clock to several portion of the system requiring synchronized clocks. Electrical and optical interconnects are compared for various design criteria based on predictions, simple replacement of an electrical system by an optical Clock Distribution Network (CDN) results in high clock skew also efficient algorithms for the construction of timing based optical clock networks are analyzed (Chen et al., 2007; Tosik et al., 2007; Minz et al., 2007). A logic synthesis approach for domino/skewed logic styles based on shannon expansion is proposed that dynamically identifies idle parts of logic and applies clock gating

(Banerjee *et al.*, 2006). Due to difficulties faced in optical interconnections fine grain clock gating has been analyzed.

#### **CLOCK GATING TECHNIQUES**

Large portion of on-chip power is consumed by clock systems which is made of clock distribution network and flip flops. Power consumption is determined by several factors including frequency f, supply voltage V, data activity  $\alpha$ , capacitance C, leakage and short circuit current. Dynamic power is also called switching power. Clock network power reduction can be obtained by the following approaches:

- Interconnect resource usage on clock network based on placement technique thereby reducing capacitance and power
- Clock gating technique where the clock is allowed to toggle only in the required portion of clocked network
- Reducing clock load in the system by reducing the number of clocked transistors

The clock network in an integrated circuit is generally designed to manage the skew between any two points in the device. A design with zero nominal skew can be achieved by employing the well-known H-tree structure compared to grid topology the tree topology is advantageous and it uses minimal routing resources to deliver the clock signal to all synchronous components. FPGA clock network must balance the minimal skew requirement with sufficient flexibility to implement

clocking requirements of many different circuits. Thus, it dissipates significant less power than the traditional clock network.

Most clock gating is done at Register Transfer Level (RTL). Clock gating can be grouped in to three categories-system level, sequential combinational level. System level clock gating stops the clock for an entire block, effectively disabling its functionality. Sequential and combinational level clock gating selectively suspend clocking while the block continues to produce output. Combinational clock gating reduces power by disabling clock on registers when the output is not changing. Fine grained clock gating allows meeting miscellaneous small units in clock sinks and aggressively save their dynamic power even for a few cycles. To reduce the switching power of logic blocks, the number of transitions needs to be reduced. Fine grain clock gating techniques focus to avoid unnecessary transitions and concentrates on glitch reduction. Coarse grained clock gating saves power from higher level of the clock tree by removing all clocks switching from its downstream units (Fig. 1).

Large part of on-chip power is consumed by clock drivers. It is desirable to have less clocked active loads. Low power in clock network is obtained reducing the clock capacity load. Any local clock load will also decrease the global power consumption.

To reduce switching power of clock network the clock signal to global clock network and local clock network has been controlled by enable signal. Reducing the clock activity will save power since switching of global clock net does not propagate into the local clock nets.



Fig. 1: Schematic of simple clock gating circuit



Fig. 2: Schematic of clock interconnection

# CONVENTIONAL CLOCK DISTRIBUTION SYSTEM

The clock signal has to be routed through more area in the chip than any other interconnect. Also, it drives the maximum number of transistors. There has been wide variation of power and area for clock distribution depending on technology. To reduce both clock skew and dynamic power, H-tree clock distribution network is taken into consideration (Fig. 2).

Various clock distributions are taken into consideration to provide effective clock distribution system. The desired characteristics of a good clock distribution system are low jitter, skew, power and metal resources. The amount of power spent in clock distribution can be divided into two, i.e., power spent in transmitting the clock signal to global and intermediate local clock wiring.

#### OPTICAL CLOCK INTERCONNECTS

Clock distribution is largely a power amplification problem. With technology scaling the device dimension and clock period continuously decrease. The delay uncertainty caused by process reduces both performance and yield. It has become increasingly difficult for conventional copper based electrical interconnect to satisfy these requirements. The concept of on chip optical interconnect was first introduced by Goodman in 1984. Since, electrical to optical and optical to electrical conversion is required, optical interconnect is particularly attractive for global interconnects. The successful realization of on-chip optical interconnects, however, greatly depends upon the development of enhanced CMOS compatible optical devices. In clock distribution networks process variation and environmental variations introduce skew and jitter. Optical scaling has immense potential to reduce clock skew delay, global clock network



Fig. 3: Schematic of optical clock interconnection

power and interference to and from neighboring electrical signals, high signal speed, independence of signal speed on waveguide variations, low heat generation. Considering the design criteria of delay uncertainty, latency, power dissipation and bandwidth, density comparison between electrical and optical network is performed (Fig. 3).

As global clock routing utilizes optical clock network, variety of techniques to reduce worst case clock skew includes in-path clock correction with clock retimes, detailed design time clock buffer tuning to minimize clock skew which also use grids to distribute clock to reduce skew in local clock routing network. The design complexity and cost of using these techniques will increase.

# PROPOSED FINE GRAIN OPTICAL CLOCK NETWORK

In this research, a novel clock tree network is proposed to improve the clock skew and reduce the power consumption. The proposed method designed to construct a buffered and gated binary clock tree in which the clock signal is utilized through global routing and Lookup table based clock network in local routing. The logic blocks of entire CMOS processor is divided into four sets for implementing fine grained global clock gating network and is implemented in each set of logic blocks. As the large part of on-chip power is consumed by clock drivers. The effectiveness of reducing clocked transistor numbers to achieve low power is obtained by grained clock gating network. The proposed method utilizes a common clock source, a global clock controller and a

Lookup table based local clock controller. This clock controller will generate clock enable signal which is utilized in clock gating.

For coarse grain clock gating to be effective, the sequential elements should be clustered such that a group of sequential elements can be gated by a single enable signal at each leaf node of local routing network. As coarse grained clock gating the number of transistor utilized is less in each set of logic blocks. Lookup table based grained technique is utilized to enable clock signal. A Lookup table is a memory with a one bit output that essentially implements the truth table where each input combination generates a certain logic output. The input combination is referred to as an address. The proposed method Lookup table functions utilizes an even parity bit generator for clock control to be obtained at each node of local routing.

#### POWER AND DELAY COMPARISIONS

The power and delay for electrical clock gating technique were simulated with  $V_{\text{DD}}$  = 5 V. The power consumption of simple electrical clock gating network in FPGA logic blocks were analyzed. The simulation results were obtained from M-power simulations in 0.35  $\mu$ m CMOS technology at room temperature. Each Lookup table drives logic blocks which are connected to leaf node (Table 1).

The proposed method includes 2 inputs 1 output Lookup table in every leaf node of local routing. The optimization goal for the synthesis of Lookup table circuit is typically the minimization of the total number of Lookup table, the number of levels of Lookup table. Minimizing the number of Lookup table in the circuit increases the size of design that can fit into fixed number of Lookup table also improve the performance of the circuit by reducing clock delay. The simulation is performed using 35  $\mu$ m technology with 500 Monte Carlo runs. The power and delay and peak current for electrical clock gating technique were simulated with  $V_{\rm DD} = 5$  V. The power consumption of simple electrical clock gating network in FPGA block is identified as 7.117 mW.

Figure 4 shows the measured output of clock network power dissipation at each leaf node without introduction of Lookup table based clock gating. It is noticed that clock network power at each leaf node is 7.117 mW and maximum  $I_{\rm dd}$  current is 6.644 mA.

A Lookup table based controller to each leaf node of local clock routing which leads to decrease in power dissipation. Lookup table at each leaf node reduces switching transitions which reduces parasitic

Table 1: Lookup table based clock controller

| CLK | CLK enable | Output CLK = Even parity bit |
|-----|------------|------------------------------|
| 0   | 0          | X                            |
| 0   | 1          | 0                            |
| 1   | 0          | 0                            |
| 1   | 1          | 1                            |



Fig. 4: Average power dissipation of clock network without grained architecture



Fig. 5: Maximum  $I_{dd}$  current output of electrical clock network without gating

capacitance between nodes and interconnection and leads to reduction in dynamic power dissipation (Fig. 5).

Lookup table based clock controller has reduced  $I_{dd}$  current from 7.087-4.491 mA at each leaf node of clock network which leads to overall reduction in dynamic power consumption (Fig. 6 and Table 2).

The design flow includes two parts: fine grained clock gating for global routing and coarse grain clock gating for local routing, i.e., leaf node of clock tree. In order to evaluate the effectiveness of the method, researchers compare simulation results of non-gated and Lookup table based gated clock tree network also the layout of grained clock tree is shown in Fig. 9.



Fig. 6: Average power dissipation of grained clock network



Fig. 7: Maximum  $I_{dd}$  current output of electrical clock network with grained architecture

Table 2: Comparison results for dynamic power dissipation and maximum leakage current of clock network and Lookup table based clock network

|                     | Maximum I <sub>dd</sub> (m | ıA)               | Powerer (mWatt)   |                        |
|---------------------|----------------------------|-------------------|-------------------|------------------------|
| V <sub>DD</sub> (V) | With<br>grained gating     | Without<br>gating | Without<br>gating | With<br>grained gating |
| 0.0                 | 0.000                      | 0.007             | 0.001             | 0.849                  |
| 0.5                 | 0.036                      | 0.083             | 0.010             | 0.012                  |
| 1.0                 | 2.174                      | 1.962             | 0.328             | 0.484                  |
| 1.5                 | 6.135                      | 5.112             | 0.849             | 1.273                  |
| 2.0                 | 11.094                     | 8.349             | 1.394             | 1.782                  |
| 2.5                 | 2.676                      | 5.048             | 4.093             | 2.962                  |
| 3.0                 | 3.102                      | 5.180             | 4.849             | 3.468                  |
| 3.5                 | 3.515                      | 5.699             | 5.513             | 3.913                  |
| 4.0                 | 3.864                      | 6.197             | 6.112             | 4.321                  |
| 4.5                 | 4.184                      | 6.644             | 6.626             | 4.693                  |
| 5.0                 | 4.491                      | 7.087             | 7.117             | 5.052                  |



Fig. 8: Comparison results for power dissipation and maximum current of clock network and Lookup table based clock network



Fig. 9: Layout of grained clock network

### CONCLUSION

A Lookup table based grained clock gating method which utilizes fine grain clock gating for global clock routing and coarse grain clock gating for local routing in 0.35  $\mu$ m CMOS compatible devices is presented in this study. Based on this fine grain gating technique, electrical and optical on chip interconnects are compared for various delay and power optimization criteria. It can be concluded that Lookup table based grained clock gating can be used for high speed applications with reduction in power consumption.

### REFERENCES

Banerjee, N., K. Roy, H. Mahmoodi and S. Bhunia, 2006. Low power synthesis of dynamic logic circuits using fine-grained clock gating. Proceedings of the Conference on Design, Automation and Test in Europe, March 6-10, 2006, Munich, Germany, pp: 862-867.

- Chen, G., H. Chen, M. Haurylau, N.A. Nelson, D.H. Albonesi, P.M. Fauchet and E.G. Friedman, 2007. Predictions of CMOS compatible on-chip optical interconnect. Integr. VLSI J., 40: 434-446.
- Ishihara, S., M. Hariyama and M. Kameyama, 2011. A low-power FPGA based on autonomous fine-grain power gating. IEEE Trans. Very Large Scale Integr. Syst., 19: 1394-1406.
- Minz, J.R., S. Thyagara and S.K. Lim, 2007. Optical routing for 3-D system-on-package. IEEE Trans. Components Packaging Technol., 30: 805-812.
- Ranganathan, N. and N.P. Jouppi, 2007. Evaluating the potential of future on-chip clock distribution using optical interconnects. Hewlett-Packard Development Company, Technical Reports, HPL-2007-163. http://www.hpl.hp.com/techreports/2007/HPL-2007-163.html.

- Rivoallon, F., 2013. Reducing switching power with intelligent clock gating. http://japan.zylinks.com/support/documentation/white\_papers/wp370\_Intelligent Clock Gating.pdf.
- Thangaraj, C., R. Pownall, P. Nikkel, G. Yuan, K.L. Lear and T. Chen, 2010. Fully CMOS-compatible on-chip optical clock distribution and recovery. IEEE Trans. Very Large Scale Integr. Syst., 18: 1385-1398.
- Tosik, G., Z. Lisik and F. Gaffiot, 2007. Optical interconnections in future VLSI systems. J. Telecommun. Inform. Technol., 3: 105-108.
- Vaisband, I., E.G. Friedman, R. Ginosar and A. Kolodny, 2011. Low power clock network design. J. Low Power Electron. Appl., 1: 219-246.
- Venkatraman, V. and W. Burleson, 2005. Robust multilevel current-mode on-chip interconnect signaling in the presence of process variations. Proceedings of the 6th International Symposium on Quality of Electronic Design, March 21-23, 2005, San Jose, CA., USA., pp. 522-527.