ISSN: 1816-949X © Medwell Journals, 2017 # FPGA Based NoC with Deadlock Free Routing in Mesh Networks Using Hexagonal Nodes <sup>1</sup>Shilpa K. Gowda, <sup>2</sup>K.R. Rekha and <sup>2</sup>K.R. Nataraj <sup>1</sup>Jain University, Bangalore, India <sup>2</sup>Department of Electrical and Computer Engineering (ECE), Institute of Technology (SJBIT), Bangalore, India Abstract: With the increase in demand for on chip networking, it has become very much necessary to design an architecture that will provide the solution to communication interfaces, low chip cost, guarantee the flexibility and high quality of service to the network. To address the above-mentioned requirements we proposed an architecture that is dynamically configure itself with respect to hardware modules such as routers, packet based switch and data packet size by changing the communication conditions and its requirements at rum time. The design is based on FPGA based digital design and incorporates routing algorithm such as x-y routing. We have built a novel hexagonal node pattern to improve the communication performance. The developed node has 6 ports architecture that is far superior to the persisting node architecture. The structure also has capability to increase communication capacity by 50%. The proposed architecture is capable of overcoming the drawbacks of popular interconnection using bus-based design which is most commonly used in FPGA based architecture developed on dynamic reconfiguration. It has been observed that the node utilizes very less area of just 15% of the available area on chip. We have also achieved reasonable low latency of just 14 clocks for the data to appear at the output node once applied at the input of the architecture. The design is have a high data throughput due to low time period of just 10 nsec. The overall design of the network is much more efficient as it has deadlock free architecture with ads up to the advantages of the design. As there is lot of on-chip space available the design can further be modified for checking different routing algorithm such as shortest path to make the network more efficient also the design can be improved by testing the same for more numbers of nodes. Key words: On chip-network, FPGA, NoC, router, dynamic reconfiguration, dead lock ## INTRODUCTION The most efficient way of interconnection with the existing core in the SoC is based on the bus topology. As we all know it is a method in which we can connect numerous cores with a single bus based on sharing basis. This method allows connecting numerous devices or cores with each other in the most efficient way. The connection technique is simple and the numbers of devices can be increased with very less efforts. However, as the number of devices and cores go on increasing the delay in communication. Delivery time also increases and availability of the bus for the use by all the devices is lowered. This limitation gives rise to a problem which needs to be addressed. The solution proposed for the same was point to point communication. Now in this method as the number of devices goes in increasing the requirements for the interconnection also increases more wiring is need Fig. 1: Bus-based systems on chip and design is getting more complex and non reusable. It has also been noticed that it sometimes develops bottleneck. If we have a look in to the recent trends of implementation we can notice that there are mainly two most commonly used interconnection techniques for SoC's. These two techniques are shown in Fig. 1 and 2. Fig. 2: Network based system on chip As we can see from Fig. 1, one is most familiar technique call as bus based interconnect the biggest advantage of the technique is that once the bus allocation is done to the core the operation is very fast and data transfer is great with no deadlock or live lock as such it can be very useful when the no. of devices or core are less in number or bus allocation or request is not very often by the cores and the bus is available to all the cores at any instance of time. The greatest drawback of the system come in to picture when number of devices goes on increasing or the request for the bus-based transfer is more. The allocation time for request is more and the client or the receiver has to wait for longer time resulting in deadlock or livelock. If we have a close looks at the other method named as network based system we can understand that delivery of the packet is not as fast as the shared bus based system due to number of intermediate router comes in to the picture. But the biggest advantage of this system is that several messages can be transmitted to the destination simultaneously. This is practically not possible in the previous technique thus the overall time for the data transmission for the entire network is very low as compared to the previous system. Hence, workings on network based interconnect for the core communication is more beneficial though there are some design complexities and scalable issues related to the same. There are two main reasons for the research in the field of NoC to be lacking and is way behind as compared to the other domains. First and foremost, the high complexity of the applications that use of the NoC makes the design more challenging and complicated. Second reason for research is that research areas that are quite traditional like physical design in this area the design constraints are static if NoC research is considered it requires complete information about the dynamic behavior of the system just by using detailed simulation or prototyping it is very hard to obtain the dynamic properties of the design (Agarwal, 1991). It has been studied for several years that there are numerous challenges in designing SoC network. It is believed that the engineers will find out the solution to the persisting problem. It has also been observed that there are various methodologies used to design network on chip however layered micro network design is one of the promising methodology for designing a complex SoC architecture in the near future (Andreasson and Kumar, 2005). In this research, it has been shown that switching networks can be implemented using a novel architecture design based on interconnection using system level architecture. It has been observed by the researcher that the performance of this design is far better than the traditional architecture. The design has been implemented through functional model verification and physical implementation on hardware of important parts of the architecture. With the help of various results generated during the research it is clearly indicated that the proposed methodology has an efficient throughput, latency and area requirements systems-on-chip architecture that can be used in future (Anjan and Pinkston, 1995). In this research work researcher has explained about a novel architecture designed for the TAM. It makes itself useful in NoC communication as a traditional SoC TAM. In this architecture each core has been selecting the wires for carrying the test data and also the core has capability of selecting a path for the data to travel. It has been analysed that DfT is also presented for discussion in the present study. The architecture is developed in such a way that the static manner routing capability can be used for configuration of input buffer which will be having capacity of changing dynamically as the test is in progress by applying very less overhead of just 9.6% of the size of the actual router. Time reduction has been obtained due to the new development and is around 6 and 55% (Arns, 1998). An researcher in this study has proposed a SoC application that uses a unique NoC design that has capability of improving the performance of date in communication system. As per the researcher a virtual channel router designed in this study if compared with the traditional router has capability to share the physical links available between the resources. This sharing will enhance the performance of the system. The architecture is based on FPGA based system and hence has tremendous advantages of FPGA such as low cost, scalability, less time to market and high speed operations (Ascia et al., 2004). An intensive study and first time ever literature review has been given in this study for the impact of RRA architecture used in router. The performance evaluation has been done and comparative analysis has been presented in the study. From various results obtained during the research shows that one of the biggest bottleneck in building block of router is the arbiter and has maximum impact on 4 input LUT architecture FPGA. Ascia et al. (2006) in this study has proposed probabilistic distance based arbitration based technique for providing equality of service. It is also useful in providing support location for task placement i NoC. It is very much possible to find an age on packet using PDBA along with nonlinear weight metric. Banker et al. and Benini and Bertozzi (2005) in this study, researcher has come up with an analytic model that can accurately service the estimation of time for round robin arbitration in any given NoC. It has been observed by the researcher that there is lack of accuracy in the analysis of performance for the NoC based architecture using existing analytic model, it is especially for the popular scheme mentioned above. Bertozzi et al. (2005), Su and Shin (1996) in Noc designs the major building block is routing algorithm and it is one of the most complex researches in network layer, protocol stack based design approach for these routing algorithm can be adapted and also can be implemented on different layers such as physical layer, data link layer, network layer and transport layer. Two routing algorithm namely xy routing and OE routing has been designed and tested on a 3×3 mesh topology NoC with 2 dimension, the simulation has been carried out on NIRGAM simulator platform. These algorithm has been compared in terms of latency per packet and throughput metrics. It has been observed that for XY routing algorithm, latency in X-direction channel averagely lager than channel latency Y-direction. Also, it has been observed that channel throughput of X-direction is average of all square with Y-direction throughput (Collins, 2003). In this study, it has been proposed by the researcher two exclusive architectures for combined allocation of check at the input check and check at the output of the architecture. VC is removed based on the venue at which the free output is not generated by the requests. Further investigation has been done on output free VC check based on interlocks. Based on the results and observations three unique interlock-free allocators are assigned that are check, O-Check 1 and 2 (Kim et al., 2004). Researcher in this study, considers that design of NoC based on routing algorithm can be developed based on protocol stack that has functional layers of OSI Model. It has been demonstrated using NIRGAM simulator platform that CBR traffic can be routed in 3 ×3 mesh topology with 2 dimensions NoC. Two different routing algorithms firstly XY routing algorithm and secondly OE routing algorithm are also, simulated using NIRGAM for analysis of three different parameters mainly based on overall average latency (Dally and Aoki, 1993). NoC characteristics and problem area: While working on NoC we have to take care of some of the basic constraints that may not be necessarily to be considered for developing an off chip network. Some constrains such as area on chip or the resources on chip are mandatorily to be considered. On chip resources are mainly used for computational or communication purpose. It becomes a great challenge for a network engineer to design a system on chip working with limitation of the chip designing. Though there are limitation of designing of on chip network there persists many advantages which motivates the engineer to take up the challenge and develop NoC. On chip network are usually designed for some dedicated operation for an application in a system to make the overall system more efficient, compact and less power utilization. While developing the network it is very much important to consider the design should not have any deadlock or livelock or starvation issues. Deadlock is the process in packets is involved in a infinite wait loops that cannot be resolved. This situation generally occur when the system not having any facility of priority. If two or more packet arrive at the node and request for the access then the node gets confused and is not in a position to take a decision for whom to allot the access. And the packet goes in an infinite loop that cannot be resolved. Livelock is the process packet keep on moving in the network and is never delivered to the destination. This situation occurs in a network with priority node at the site of delivery. If the design is developed in an order that the packet to the node will be delivered as per the priority then the packet with the least priority keep moving in the network for indefinite time as it do not get access to the destination node as other packet are at priority of delivery. Starvation is the process in which the packet is never getting an opportunity to use the network for the reaching to the destination as it is waiting for the grant of the network. This situation generally occurs when the packets with low priority are not allowed for going in network by the high priority packet. In this study, we will be considering only with the issue of deadlock and the way to eliminate the same in newly developed hexagonal based network. Figure 3 Shows the basic flow of the data between four nodes that are intended to communicate with each other. The network is Fig. 3: Flow diagram for deadlock representation designed based on simple concept of providing the access to each based on the request at that instance of time. If the packet 1 wants to move towards node 3 the route for him through node 2 which is all requesting to communicate to node 3 simultaneously node 3 and four needs to transfer the date to node 1 at the same time. At this situation all the node receive request from the neighboring node. The network developed above is in position to cater only one request at a time and cannot handle multiple request and hence enters in a confusion state what should be done and all the nodes are at the sage of deadlock we no decision to be take in terms of which packet to be processed. The only solution to such situation is to make use of routing algorithm with priority for the packet to be processed. Various routing algorithm are available to cater such situation and avoid deadlock in the network. In our design it is mandatory to use the routing technique as we have 6 possible input to come at the node for the communication. If such situation arises the whole network will fail and application will collapse. #### MATERIALS AND METHODS **Proposed hexagonal node pattern:** The proposed hexagonal node pattern Packet-Switched Network-on-Chip (PS-NoC) is shown in Fig. 4. The hexagonal node pattern is used in packet switching NoC to improve the communication between the internal routers. Compared to the normal node, a hexagonal node avoids the bus based interconnection module limitations. The complete hexagonal node module is used in most of the dynamic reconfigurable NoC's. The generally routing algorithm is used to select the packets in any of the east, west, south and north directions. Here, we are using XY routing algorithm and its flow is as shown in the Fig. 5. Fig. 4: Hexagonal node pattern packet-switched network-on-Chip (PS-NOC) Fig. 5: Detailed internal architecture of single hexagonal node pattern The XY algorithm is distributive-deterministic routing algorithm which is used to avoid the network problems such as deadlock live-lock and congestion as discussed in the previous section. It transmits the packet either through y-axis direction or x-axis direction. If we use this deterministic algorithm, it provides the traffic free network to improve the high throughput, low latency of the router design (Dally and Seitz, 1987; Flich *et al.*, 2007; Frazzetta *et al.*, 2008) (Fig. 6). The flow of XY-routing algorithm is as follows: For XY algorithm here we are introduces z-axis also for hexagonal node structure to improve the communication between the node structure. Firstly, it checks the current address Z is matches with destination address z of the router packets if it matches then it will check x-axis direction and then y-axis direction if all the destination and current address are matched then packet output will Fig. 6: Flow of XY-routing algorithm be generated in local output. If it is not matched then checks others ports. If current address y is greater than the destination address y, packet will be allocated to out1 else out2. If it is not matched then it checks others ports. If current address x is greater than the destination address x, packet will be allocated to out3 else out4. If it is not matched then it checks z-axis. If current address z is greater than the destination address z, packet will be allocated to out5 else out6. If the requests are generated by two or more direction then the arbiter comes in to the picture in order to sort the request and allot the passage. The design is made in such a way that it has got a rotating priority for the generated requested to avoid the situations of deadlock and live lock. The basic flow diagram of the routing algorithm. #### RESULTS AND DISCUSSION The complete hexagonal packet switched NOC is designed and implemented on Xilinx 14.7 ISE and | /Hexagonal_NOC_tb/dk | 0 | nnnna | mh | noor | MM | | MM | unn | M | | MM | mm | MM | mm | noo | ato | ndonno | |--------------------------------|-----------------------------------------|--------------------------------------------------|--------|--------|-------|-----------|------|----------|-------|----------------|-------|----------------|-------|----------|-------|--------|-----------------------------------------| | /Hexagonal_NOC_tb/rst | | | 4 | | | | | | | | | | | | | | | | /Hexagonal_NOC_tb/packet_in0 | aaaabbbbccc0 | )))))(((()) | | aattot | ccc() | (000000 | 0000 | | | | | | | | | | | | /Hexagonal_NOC_tb/packet_in1 | 000000000000000000000000000000000000000 | )))))((()) | | | | ffeetoo | ((0) | 00000 | 00000 | | | | | | | | | | Hexagonal_NOC_tb/packet_in2 | 00000000000 | 00000000 | | | | | | cfaaboo. | ccc0 | 000000 | ))))) | | | | | | | | /Hexagonal_NOC_tb/packet_in3 | 00000000000 | )))))(((()) | | | | | | | | fleatoot | cccO | 00000 | ))))) | | | | | | /Hexagonal_NOC_tb/packet_in4 | 000000000000000000000000000000000000000 | )))))(((() | | | | | | | | | | ccaabbb | cccO | (((((( | 0000 | | | | /Hexagonal_NOC_tb/packet_in5 | 00000000000 | ))))((()) | | | | | | | | | | | | aa0ftoot | cccl) | (0) | 000000 | | /Hexagonal_NOC_tb/packet_in6 | 000000000000000000000000000000000000000 | )))))((()) | | | | | | | | | | | | | | afti | bbccc0 | | /Hexagonal_NOC_tb/packet_out0 | 00000000 | -000000 | | 88880 | bb | (000) | ))) | | | | | | | | | | | | /Hexagonal_NOC_tb/packet_out1 | 00000000 | 000000 | | | | ffaato | b | (000) | )))) | | | | | | | | | | /Hexagonal_NOC_tb/packet_out2 | 00000000 | 000000 | | | | | | cfaab | ċδ | (0000 | 000 | | | | | | | | /Hexagonal_NOC_tb/packet_out3 | 00000000 | 0000000 | | | | | | | | flaab | 00 | ())))) | 000 | | | | | | /Hexagonal_NOC_tb/packet_out4 | 00000000 | 000000 | | | | | | | | | | (ccaab | ίò | ())))) | 000 | | | | /Hexagonal_NOC_tb/packet_out5 | 00000000 | 000000 | | | | | | | | | | | | aa0fo | άò | 00 | 00000 | | /Hexagonal_NOC_tb/packet_out6 | 00000000 | 000000 | | | | | | | | | | | | | | 38 | fabb | | Non | 1100 ns | in minimin in i | 100 ns | | 20 | ns<br>Ins | 3) | ns an | 4) | inninin<br>Ins | 50 | inninin<br>Ins | 60 | ns | 70 | | 80 | | Cursor 2 | 715 ns | | H | | | | | | 1.62 | ИН | | | | | | 715 rs | | | Cursor 3 | 100.201 ns | | 00.201 | ns | | | | | | | | | | | | | | | Hevannal NOC thluttharket in 3 | 00000000 | *************************************** | | | | | | | | 1111000 | | | | 0000000 | | | *************************************** | Fig. 7: Simulation results of single hexagonal node 6 Table 1: The proposed hexagonal NoC design summary using different FPGA's-Spartan-3, Viterx-5 and Artix-7 | | XC3S400-5 | 5VLX110T-3 | 7A100T-3 | |--------------------------------|-----------|------------|----------| | Logic utilizations | PQ208 | FF1136 | CSG324 | | No. of slice registers | 930 | 124 | 910 | | No. of slice LUTs | 1224 | 1338 | 998 | | No. of fully used LUT-FF pairs | 1702 | 1192 | 875 | | No. of bonded IOBs | 562 | 562 | 485 | | No. of BUFG/BUFGCTRLs | 1 | 1 | 1 | Table 2: The proposed hexagonal NOC timing analysis using different FPGA's-Spartan-3,Viterx-5 and Artix-7 | | XC3S400-5 | 5VLX110T-3 | 7A100T-3 | |-------------------------|-----------|------------|----------| | Timing parameters | PQ208 | FF1136 | CSG324 | | Minimum period (nsec) | 9.639 | 2.297 | 1.89 | | Maximum frequency (MHz) | 130.901 | 435.265 | 528.996 | | Setup time (nsec) | 10.183 | 2.752 | 1.711 | | Hold time (nsec) | 6.216 | 2.783 | 0.645 | Simulated using modelsim 6.3f. For implementation we are using Artix 7 device: 100T, Speed: 3 and Package: CSG 324. Figure 7 shows the simulation results of single hexagonal node6. It contains all the 7.48 bit inputs including local packet and 6.48 bit outputs and single 32 bit local packet output. We can see the time taken to process one input and its. Table 1 and 2 show the complete hexagonal PS-NOC (including all nodes) design utilization using different FPGA's and timing analysis of design, respectively. ### CONCLUSION In this study, we are concluding that the newly developed hexagonal node pattern based packet switched NoC can be efficiently used as a NoC design with all the functionality of NoC. The designed developed is deadlock free and has high throughput. We have also seen from various results of synthesis report and timing report that the design is area efficient with little overhead in terms of slices on chip. The overall throughput is improved with the node having more ports as compared to traditional node with only four direction input and output port. The results are analyzed and design utilization and timing of single and complete hexagonal PS-NoC with different FPGA's. The slice utilization is improved over 40% when compared to normal NOC design architecture. #### REFERENCES - Agarwal, A., 1991. Limits on interconnection network performance. IEEE. Trans. Parallel Distrib. Syst., 2: 398-412. - Andreasson, D. and S. Kumar, 2005. Slack-time aware routing in NoC systems. Proceedings of the 2005 IEEE International Symposium on Circuits and Systems (ISCAS'05), May 23-26, 2005, IEEE, Kobe, Japan, ISBN:0-7803-8834-8, pp. 2353-2356. - Anjan, K.V. and T.M. Pinkston, 1995. DISHA: A deadlock recovery scheme for fully adaptive routing. Proceedings of the 9th International Symposium on Parallel Processing, April 25-28, 1995, IEEE, Santa Barbara, California, ISBN:0-8186-7074-6, pp. 537-543. - Arns, R.G., 1998. The other transistor: Early history of the metal-oxide semiconductor field-effect transistor. Eng. Sci. Educ. J., 7: 233-240. - Ascia, G., V. Catania and M. Palesi, 2004. Multi-objective mapping for mesh-based NoC architectures. Proceedings of the 2nd IEEE-ACM-IFIP International Conference on Hardware-Software Codesign and System Synthesis, September 08-10, 2004, ACM, Stockholm, Sweden, ISBN: 1-58113-937-3, pp: 182-187. - Ascia, G., V. Catania, M. Palesi and D. Patti, 2006. Neighbors-on-path: A new selection strategy for onchip networks. Proceedings of the 2006 IEEE-ACM-IFIP Workshop on Embedded Systems for Real Time Multimedia, October 26-27, 2006, ACM, Washington, DC, USA., ISBN: 0-7803-9783-5, pp: 79-84. - Benini, L. and D. Bertozzi, 2005. Network-on-chip architectures and design methods. IEE. Proc. Comput. Digital Tech., 152: 261-272. - Bertozzi, D., A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini and G. De Micheli, 2005. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. Parallel Distrib. Syst., 16: 113-129. - Collins, L., 2003. Chip makers hit heat barrier. IEE. Rev., 49: 22-23. - Dally, W.J. and C.L. Seitz, 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput., 36: 547-553. - Dally, W.J. and H. Aoki, 1993. Deadlock-free adaptive routing in multi-computer networks using virtual channels. IEEE. Trans. Parallel Distrib. Syst., 4: 466-475. - Flich, J., A. Mejia, P.L Opez and J. Duato, 2007. Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS'07), May 9, 2007, IEEE Computer Society, Washington, DC., USA., ISBN:0-7695-2773-6, pp: 1-12. - Frazzetta, D., G. Dimartino, M. Palesi, S. Kumar and V. Catania, 2008. Efficient application specific routing algorithms for NoC systems utilizing partially faulty links. Proceedings of the 11th Euro-micro Conference on Digital System Design Architectures Methods and Tools (DSD'08), September 3-5, 2008, IEEE, Parma, Italy, ISBN:978-0-7695-3277-6, pp. 18-25. - Kim, D., M. Kim and G.E. Sobelman, 2004. CDMA-based network-on-chip architecture. Proceedings of the 2004 IEEE Asia-Pacific Conference on Circuits and Systems Vol. 1, December 6-9, 2004, IEEE, Tainan, Taiwan, ISBN:0-7803-8660-4, pp: 137-140. - Su, C.C. and K.G. Shin, 1996. Adaptive fault-tolerant deadlock-free routing in meshes and Hypercubes. IEEE. Trans. Comput., 45: 666-683.