# A Survey on FPGA Implementation of Medical Image Registration 

${ }^{1}$ C. John Moses and ${ }^{2}$ D. Selvathi<br>${ }^{1}$ Department of Electronics and Communication Engineering, St. Xavier's Catholic College of Engineering, Nagercoil, India<br>${ }^{2}$ Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi, India


#### Abstract

Medical image registration is the process of aligning two or more images that represent the same anatomy at different times from different viewing angles or using different sensors. It geometrically aligns two images the reference and floating images. FPGA technology is used to improve performance while providing programmability and dynamic configurability. This study provides the evaluation of different kind of design and implementation of image registration using FPGA (Field Programmable Gate Array). This research focuses on performance, computational complexity, area required and power utilization of different algorithms on FPGA. The goal of this evaluation is not to determine an overall best method but to present a comprehensive catalogue of methods in a uniform terminology to define general properties and requirements of local techniques and to enable the reader to select that method which is optimal for a specified application in medical imaging.


Key words: FPGA, image registration, pipelining, reconfigurable, mutual information

## INTRODUCTION

Medical image analysis uses image registration as a preprocessing step for diagnosis, disease monitoring, treatment planning and image guided surgery (Mani and Arivazhagan, 2013; Varnavas et al., 2013; Pluim and Fitzpatrick, 2003). Image Registration ( $\mathbb{R}$ ) is the method of establishing the point by point connection between two images. The images may be obtained with different sensors (multi-modality) otherwise is the same sensor at different times (mono modality) (Goshtasby, 2005). Medical IR is used to follow changes in anatomy or else to join corresponding structural and functional information. Medical imaging is about creating outline, structure, size and spatial associations of anatomical structures in the patient, jointly by means of spatial information regarding function and some pathology or added abnormality. IR can be utilized to align several images from the similar individual (intrasubject registration) and to compare images obtained from dissimilar subjects (intersubject registration) (Hajnal et al., 2001).

The IR algorithms can be implemented by VLSI (Very Large Scale Integration) technology for real time operation. Several VLSI architectures have been proposed by different researchers. One of the researcher, researches is VLSI architecture for Image Registration (Gupta and Gupta, 2007). This research presents an efficient VLSI architecture for real-time implementation of image
registration schemes using Application Specific Integrated Circuit (ASIC) to speedup the process. But the ASIC is not reconfigurable.

An FPGA is an appliance that has a template of configurable look-up tables and interconnects that permits for digital logic to be implemented with decreased cost and design time as compared to ASICs due to its reconfigurable computing characteristic. The phrase reconfigurable computing refers to a technique of making computations characterized by the capacity to modify the hardware architecture throughout algorithm execution. The advantages of utilizing FPGAs as an objective for an application are that they are small in size, fairly inexpensive, offer a parallel architecture as well as serve as a transitional step between DSP (Digital Signal Processor) and ASIC resolutions. Of these being capable to be reconfigurable itself is somewhat that cannot be made in hardware through extra devices and has been demonstrated to be functional in several applications (White, 2008). The FPGA is utilized to implement real time schemes and the real-time image registration is necessary in the medical field for facilitating image-guided treatment methods and pre-operative treatment preparation. Furthermore, the hardware implementation is one approach to speedup applications over software implementation. The functionality of a hardware design on FPGA mechanism depends on mutually intrinsic parallelism of the design and the uniqueness of the FPGA machine itself and adjoining data interface (Sen et al., 2006).

## LITERATURE REVIEW

Image registration in medical imaging is utilized to combine or contrast images obtained from a variety of modalities such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), Single Photon Emission Computed Tomography (SPECT) and ultra sound. Common medical application of image registration are multimodality fusion of anatomical (CT or MRI) and functional (PET or SPECT) images for accurate localization of active tumors as well as delineation of their shape and size, registration of serial images for monitoring the progression or regression of a disease and postoperative follow-up and brain atlas registration in which a brain image of a given patient is morphed into a predefined template to recognize and label specific regions of the brain. The registration of single modality images permits monitoring changes over time while the registration of multimodality images mingles the complimentary structural and functional information concerning a certain organ (Hajnal et al., 2001).

An accelerating fluid registration algorithm on multi-FPGA platforms is proposed to improve the speed (Cong et al., 2011). This research tries to accelerate a PDE (Partial Differential Equation) based non-parametric registration called fluid registration. Fluid registration standardizes the deformation utilizing a fluid PDE equation and it permits registration of huge deformations. Fluid legalizes guarantee that the transform function is even. This registration scheme contains of four parts: transform, interpolator, metric, plus the optimizer on the metric. Transform states that the kind of transforms from the coordinates in the reference image to the coordinates in the objective image space. Interpolator makes sampling/scaling of the image so as to attain the pixel value at the non-integer coordinates. Metric is the idea to optimize. The distinctive resemblance metrics for the two images comprise the amount of Squared Differences (SSD). The architecture was implemented on multi-FPGA platform convey HC-1. The design was depicted using synthesizable C code and then translated to Verilog RTL by means of autopilot Version 2010.a.3. Xilinx ISE 11.5 was utilized to gain the final bit stream. This architecture offers about a 35 x speedup contrasted to single-threaded CPU (Central Processing Unit) implementation and 9x speedup contrasted to 4-thread CPU implementation. This effort was also implemented on Graphical Processor Unit (GPU). Xilinx power analyzer accounts that each FPGA design consumes 22 W . So, the 4 -FPGA design consumes 88 W . But the total power of the tesla GPU card is around 200 W . Thus, the FPGA implementation slightly better performance whereas consuming below half of the power of the tesla GPU card.

A parameterized hardware design on reconfigurable computers: an image registration case study (Huang et al., 2009). This report presents the hardware implementation of exhaustive search algorithm and discrete wavelet transform based search algorithm. Both algorithm have been implemented on the Cray XD1 reconfigurable computer using Xilinx Virtex-2P50FF1152-7 FPGA device. On Cray XD1 platform, each FPGA device is connected to four local SRAM modules 4 MB each. Every local memory module has separate reading and writing ports connected to FPGA device and is capable to understand reading or writing transactions each clock cycles. The greatest operating clock rate for user logic is 200 MHz . All the hardware modules are entirely pipelined in this design. Contrasted to the performance of the software implementation running on a single microprocessor, AMD Opteron 2.4 GHz , the performance of the hardware implementation on a single FPGA device is in the order of 10 x better (exhaustive search) and 2 x speedup (DWT-Discrete Wavelet Transform based search). The speeding up of hardware implementation exhaustive search algorithm compared to software description is linearly proportional to the amount of local memory banks. An Embedded Image Registration using FPGA (White, 2008).

This study focuses the overlap between two image registration algorithms, the first recuperates only the affine (i.e., rotation, skew and translation with 6 degrees of freedom) parameters of the homography among the images while the second expands that method to recuperate the projective (i.e., affine plus the line at infinity, representing all 8 degrees of freedom) parameters of the homography. The preference of these two schemes is inspired by the overall objective of full projective parameter assessment whereas allowing for the supplementary straight forward early implementation of affine parameter assessment. The hefty overlap among these two schemes permits for the mainstream of the architecture to be reprocessed when transitioning among these two dissimilar phases of this research. The FPGA board utilized for this architecture was Xilinx ML506 which hosts a Xilinx virtex 5-SX50 FPGA a board designed with state of the art embedded image processing application. Multi Objective Optimization of FPGA-Based Medical Image Registration (Dandekar et al., 2008). This research presents that much image similarity measure. For instance the sum of squared differences and cross correlation have been used. It also states that Mutual Information (MI) has newly materialized as the desired similarity measure. MI-based image registration is valuable in multimodality image registration. Still this form of registration necessitates thousands of iterations, depending on image complication and the degree of initial misalignment among images. The voxel counter calculates
the address consequent to each voxel in that sub volume in $z-y-x$ sort. The coordinate transformation is achieved by an easy matrix multiplication. Partial Volume (PV) interpolation offers soft changes in the histogram values with minute revolutionizes in transformation. The multi-objective optimization will scientifically exploit correctness for a given hardware cost restraint or decrease hardware resources to convene the accuracy conditions of a medical application. Partial search algorithm discovers only a segment of the complete design space. Random search grips randomly making a rigid number of possible solutions. The Evolutionary Algorithm (EA) based search suggests an improved cover rate which decodes to enhanced range and variety of solutions when evaluated with either partial or random searches. It offers better imminent about the understanding of image registration correctness to a variety of design parameters. Furthermore, this report shows that implementation of an FPGA-based architecture for accelerated calculation of MI which is capable of computing MI 40 times faster as contrasted to software implementation.

A model based mapping of reconfigurable image registration on FPGA platforms is reported (Sen et al., 2008). This research presents dynamically reconfigurable image registration and has the capability to refrain its parallel processing arrangement adaptively based on appropriate characteristics of the input images. Data flow modeling exposes high level application structure that is helpful for analysis, verification and optimization. Synchronous Dataflow (SDF) modeling can offer guarantees on buffer sizes and formulate provable safe schedules. Cyclo-Static Dataflow (CSDF) offer more elasticity but still does not consent data-dependent construction or consumption pattern. Homogenous parameterized dataflow consents well-organized scheduling and resource allotment for actors in addition to confirmation of bounded memory necessities and standstill free operation. Mutual information Methods have to be healthy and useful for multi-model images. The target of $M I$ based $I R$ is to find its optimal transformation. The sum of required external memory rises with increasing numbers of parallel data paths because of multiple equivalent copies of Mutual Histogram (MH) memory module. As the data path raises power consumption also raises. These results are gained using the Quartus 11 synthesis tools from Altera for the StratixII family of FPGAs (StratixII EP2S1 5F 484C5).

Further, acceleration technique is Taxonomy for Medical Image Registration (Plishker et al., 2007a, b). This presents approaches from different levels and can lead to
over 100 x speedup on an eight node assorted cluster. Acceleration procedures in general may adapt functionality which relies on parallelism for performance enhancements. Optimization level parallelism symbolizes an algorithm that can sprint in parallel offers iteration as the essential unit of image registration. In volume level parallelism computational components operate on intact volumes. Voxel level parallelism expresses parallelisms in conditions of single voxels. Lower levels of parallelism have leaned to offer more chances for acceleration. Hardware platforms are well suitable for process level parallelism. This work utilizes sub volume level procedure to give out the incline computation uniformly across nodes in a little cluster and operation level procedure on an FPGA for the transformation and likeness calculation of a voxel. The PC cluster and the uniprocessor outcomes are caused on a 3 GHz Intel Xeon. The FPGA accelerator panel featured an Altera Stratix EP1S40 FPGA.

FPGA-Accelerated Deformable Image Registration for Improved Target-Delineation During CT-guided Interventions (Plishker et al., 2007a). Computed Tomography (CT) technology offers the opening to attain essential imaging speed, coverage of the operative ground (about $4-8 \mathrm{~cm}$ ) and high-resolution (up to 0.625 mm ) intra-procedural imaging. Registration of a sub volume through the hierarchical refinement procedure is based on maximization of the MI which is an arithmetical measure. Precise and high-speed accomplishment of multimodality deformable image registration can facilitate its integration in IGI (image guided interventions) researchflow. Conversely, the lengthy computation times of these schemes have blocked their use in clinical researchflow. This article presents FPGA based architecture for accelerated implementation of Mutual Information (MI) based deformable registration. This shows that this integration has superior intra-procedural target delineation which may direct to improved process results. A significant step towards this is speeding up of deformable image registration. The aforesaid algorithm utilizes MI as a measure of image similarity. MI-based image registration can be notion of as an optimization crisis of getting the finest alignment among two images. The reported architecture was implemented using an Altera Stratix II EP2S180F1508C4 FPGA in a PCI prototyping board (DN7000K10PCI). The architecture was designed using VHSIC Hardware Description Language (HDL) and synthesized using altera Quartus II 6.1. The design realized a greatest internal frequency of 200 MHz with a 100 MHz RI (Reference Image) processing rate. The coordinate transform, PV interpolation and MH (Mutual Histogram) calculation operations were
implemented using the four Look Up Tables (LUT), first order polynomial arrangement and employed 32 bit fixed point representation. The reported architecture was targeted to accelerating the computation of MI for a hierarchical volume sub-division based deformable registration scheme. Throughout, the execution of this scheme, MI must be repetitively calculated under a candidate transform for every sub volume at every level of subdivision. The MI calculation time for an ICT (intra-procedural contrast CT)-preCT (pre-procedural contrast-enhanced-CT) image pair with dimensions at $256 \times 256 \times 256$ was $225,42 \mathrm{msec}$ but the time based on software implementation was 9410 msec . The heftiness, accuracy and speed suggested by the reported solution in coincidence with its solid implementation create it preferably suited for clinical deployment.

A heterogeneous Medical Image Registration Acceleration Platform (Plishker et al., 2007b). This proposes parallelism through multiple FPGA. This research shows that using an 8 node heterogeneous platform may well produce up to a 190x speedup over a best performance, common purpose uni-processor.

Hardware implementation of Hierarchical volume subdivision based elastic registration (Dandekar et al., 2006). This implementation deals with an MI based elastic registration algorithm that utilizes volume subdivision. Elastic image registration uses a non-linear, continuous transformation and hence is better suited for recovering realistic tissue deformations. MI-based elastic registration has been shown to be effective in multi-modality image registration due to the robustness of the similarity measure. This algorithm has been used and validated in the context of whole body PET-CT registration and ultra low-dose CT-guided interventions. This study also reports FPGA based architecture optimized for acceleration Free-Form Deformation (FFD) based image registration. The reported architecture was designed on Altera Stratix EP1S40 FPGA in a PCI prototyping board. The design achieved a maximum internal frequency of 200 MHz with a 50 MHz reference image processing rate. Reference and floating images were kept using two separate standard PC100 Synchronous Dynamic Random Access Memory (SDRAM) modules. Entropy computation was implemented using the 4-LUT, first-order polynomial configuration. Mutual information was intended using 32 bit fixed point numbers. With wider reference image RAM bus or Double Data Rate (DDR) SDRAMs, the expected processing speed of the system is 100 million reference image voxels per sec. This processing speed interprets to image registration time of about 1 min , for image size $256 \times 256 \times 256$ with 5 levels of subdivision and 50 optimization iterations at all level.

This architecture achieved a speedup of over 100 for elastic registration against the corresponding software implementation on 3.2 GHz parallel implementation III Xeon research stations with 1.5 GB of 266 MHz RAM.

Another implementation is a real-time registration and display of confocal microscope imagery for multiple-band analysis (Budge et al., 2004). This proposes the affine transform correction to register two linearized images it is compulsory to execute rotation, scale and translation on one of images to match it to the further image. This design mingles single Xilinx XC2V6000 FPGA and six 32 bit wide busses envied to external memory banks. Every memory bank is hefty enough to have two full frame buffers. One of the memory banks is kept to hold improvement offsets to take away flat pattern noise and a further is kept for future development of the scheme. This implementation necessitates no multipliers and a sum of 14 adders to research out warp and tilt correction.

Another FPGA-based computation is a Free-Form Deformations in Medical image Registration (Jiang et al., 2003). This research presents algorithms for establishing FPGA-based systems that Free Form Deformation (FFD) in medical IR. FFDs are suitable tool for developing three-dimensional objects. This research also states that a fixed point multiplication needs more hardware components than floating-point multiplication. This design was established by using pipelined structure. The area cost of this design utilizing fixed point representation is higher than the floating-point representation, whilst the clock speed for the fixed point representation is slightly faster. For a 2-dimentional image of resolution 256 by 256, the clock speed of two-pipeline implementation on a Xilinx Virtex II XC2V6000 device. The computation time is $(1024 \times 1024 \times 48) /(85 \times 106 \times 9)=66 \mathrm{msec}$ which presents a frame rate of approximately 15 frames per sec.

Further, high performance FPGA implementation is MFNN for image registration (Puranik and Gharpure, 2002). This proposes an image registration using standard Sequential Scan and Detect (SSDA) Algorithm with Artificial Neural Networks (ANNs) for template search. A template library consisting of registration marks is produced from a normal image. The MFNN is prepared for the pattern library with the back propagation paradigm. For the pattern search, the Multilayer Feedforward Neural Network (MFNN) is performed in the recall approach. The image is scanned successively extracting an image fragment each time for revealing of registration mark. The network output is ' 1 ' to point out presence of the registration mark besides remains ' 0 '. The hardware realization of the ANN necessitates a floating-point multiply and accumulator. The most important complexity in FPGA implementation of the neural network is in recognition of the floating point multiplier and the
commencement function due to their difficulty and restricted resources. This architecture simplifies the MFNN form in terms of synaptic weight values and activation function. For image registration scheme a network with 257 inputs, 4 hidden and 1 output neurodes is proposed. A pipeline arrangement inputting the data as one column at a time is assumed to understand the network in FPGA. The architecture of hidden neurode consisting of a Weight Memory Bank (WMB) column data and a multiplier are employed. The multiplier's outputs are gathered in four accumulators consequent to four hidden neurodes. The accumulator output is contrasted with a threshold to create the neurode output. The four outputs gained from the hidden layer are multiplied with associated weight values parallel added collectively and compared with a threshold to make the output. The ( $\mathrm{x}, \mathrm{y}$ ) coordinates of the image fragment for which the pattern match is generated are kept in two mod- 256 counters together with total number of matches. The tailored MFNN architecture is modeled in VHDL (Very High Speed Integrated Circuit Hardware Description Language) and is implemented in Xilinx FPGA XL4085. From the answers gained this report says that the modifications do not change the system performance in terms of registration correctness and detection rate. Simultaneously, the hardware MFNN expands the speed. The system can be utilized at the speed of 1 MHz for one image fragment. An Architecture of Medical Image

Registration System based on Multiple-Valued Logic (Hata et al., 2000). This presents the development of establishing association between every point in two images of the similar scene. In intensity-based image registration process, it is time overriding to evaluate two images and to assess the dissimilarity of proximity for every voxel space. Also, the intensity based registration is complex to define and detect the landmark. This report states that the intensity of the medical images of a modality based on the product corporation of the scanner or manufacturing number, scanning parameters. The intensity value for all voxels of all intracranial structures ranged between 0 and 255 . The research also concludes that the processing time for an image pair can be saved by using multiple-valued logic hardware.

## PERFORMANCE ANALYSIS

As different image registration schemes may use different image sizes and thus may require varying tracking ranges also may require varying memory ranges. This survey researchs to identify algorithms used to implement medical $\mathbb{R}$ on different FPGA platforms. Based on the comparative study this research analyses different type of hardware structure of IR algorithms, performance of the specified hardware structure, area required, power and memory usage. The detailed comparison between several algorithms is shown in Table 1.

Table 1: Analysis of image registration architecture

| Algorithms | Devices | Hardware structure | Performance (speed) | No. of LUTs/ Area required | Power | Memory usage |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Deformation using a fluid | Multiple FPGA | NA | 35 x speedup | NA | 22W/FPGA | NA |
| PDE equation (Pluim and Fitzpatrick, 2003) | (4 numbers) |  |  |  |  |  |
| Exhaustive search and | Xilinx-Vertex | Pipelined | ES: 10 x | NA | NA | 4 SRAM |
| DFT based search algorithms | 2p 50 ff1152 |  | DWT: 2 x better than software implementation |  |  | (4 MB each) |
| Affine transform (White, 2008) | Xilinx Vertex-5-5x50 | NA | NA | NA | NA | NA |
| Mutual information (Dandekar et al., 2008) | NA | NA | 40 times faster than software implementation | NA | NA | NA |
| Mutual information (Sen et al., 2008) | Altera Stratix II EP2S1F84C5 | Parallel data path | High | NA | High | High |
| Similarity measure (Dandekar et al., 2006) | Altera Stratix EP1S40 | Parallelism | 100x | NA | NA | NA |
| Intensity based and mutual information | Altera Stratix EP25180F1508c4 | NA | 225.42 msec | 4 | NA | NA |
| MI (Puranik and Gharpure, 2002) <br> Multiprocessing <br> (Plishker et al., 2007b) | NA | Parallelism on components | 190x | NA | NA | NA |
| MI based elastic $\mathbb{R}$ (Dandekar et al., 2006) | Altera Stratix EP1S40 | NA | 1 min for image registration | 4 (for entropy measure) | NA | NA |
| Affine transform (Budge et al., 2004) | $\begin{aligned} & \text { Xilinx } \\ & \text { XC2V6000 } \end{aligned}$ | NA | Low | Less | NA | NA |
| MI <br> (Jiang et al., 2003) | $\begin{aligned} & \text { Xilinx Virtex II } \\ & \text { XC2 } \end{aligned}$ | Pipeline | High | NA | NA | NA |
| Standard sequential <br> scan and detect ANN <br> (Puranik and Gharpure, 2002) | $\begin{aligned} & \text { Xilinx FPGA } \\ & \text { XL4055 } \end{aligned}$ | Pipelined | High | NA | NA | NA |
| Intensity based (Hata et al., 2000) | NA | Multiple <br> Valued logic | High | NA | NA | NA |

## CONCLUSION

This research reviewed the various techniques used in Real Time and Reconfigurable Medical Image Registration algorithms. The research evaluated different FPGA Implementation of Medical Image Registration algorithms. On each and every technique the proposed research tries to identify performance of the algorithms in terms of speed and accuracy and also tries to identify computational complexity. As the major design goal of VLSI technology is less area this research also identifies the memory requirement of various algorithms and especially it evaluated the chip area for FPGA based applications in medical image processing. The research found that the pipeline technique is used in all algorithms to perform image registration for improving the performance.

Also, the analysis found that when the parallelism is increased the area requirement is also increased. The various proposals say that the hardware based, i.e., FPGA implementation performs with fast as compared with any CPU or software based implementations. One of the important challenges for future research is that how to reduce chip area with enough speed and reasonable power consumption. Further, researchs to be carried out to reduce the chip area by removing redundant arithmetic elements.

## REFERENCES

Budge, S.E., A.M. Mayampurath and J.C. Solinsky, 2004. Real-time registration and display of confocal microscope imagery for multiple-band analysis. Proceedings of the 38th Asilomar Conference Record Signals, Systems and Computers, November 7-10, 2004, Pacific Grove, CA., pp: 1535-1539.
Cong, J., M. Huang and Y. Zou, 2011. Accelerating fluid registration algorithm on Multi-FPGA platforms. Proceedings of the IEEE 21st International Conference on Field Programmable Logic and Applications, September 5-7, 2011, Greece, China, pp: 50-57.
Dandekar, O., V. Walimbe and R. Shekhar, 2006. Hardware implementation of hierarchical volume subdivisionbased elastic registration. Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, August 30-September 3, 2006, New York, pp: 1425-1428.

Dandekar, O., W. Plishker and R. Shekhar, 2008. Multiobjective optimization of FPGA-based medical image registration. Proceedings of the IEEE 16th International Symposium on Field Programmable Custom Computing Machines, April 14-15, 2008, Palo Alto, CA., pp: 183-192.
Goshtasby, A.A., 2005. 2-D and 3-D Image Registration: For Medical, Remote Sensing and Industrial Applications. John Wiley and Sons Inc., Hoboken, New Jersey, ISBN-13: 9780471649540, Pages: 284.
Gupta, N. And N. Gupta, 2007. A VLSI architecture for image registration in real time. IEEE Trans. Very Large Scale Integr. VLSI Syst., 15: 981-989.
Hajnal, J.V., D.L.G. Hill and D.J. Hawkes, 2001. Medical Image Registration. CRC Press, Boca Raton, Fla, USA.
Hata, Y., S. Kobashi, N. Kamiura, Y. Kitamura and T. Yanagida, 2000. On an architecture of medical image registration system based on multiple-valued logic. Proceedings of the 30th IEEE International Symposium on Multiple-Valued Logic, May 23-25, 2000, Portland, Oregon, USA., pp: 273-278.
Huang, M., O. Serres, T. El-Ghazawi and G. Newby, 2009. Parameterized hardware design on reconfigurable computers: An image registration case study. Proceedings of the 5th Southern Conference on Programmable Logic, April 1-3, 2009, Sao Carlos, pp: 71-76.
Jiang, J., W. Luk and D. Rueckert, 2003. FPGA-based computation of free-form deformations in medical image registration. Proceedings of the IEEE International Conference on Field-Programmable Technology, December 15-17, 2003, England, pp: 234-241.
Mani, V.R.S. and S. Arivazhagan, 2013. Survey of medical image registration. J. Bio-Med. Eng. Technol., 1: 8-25.
Plishker, W., O. Dandekar, S. Bhattacharyya and R. Shekhar, 2007a. A taxonomy for medical image registration acceleration techniques. Proceedings of the IEEE Life Science Systems and Applications Workshop, November 8-9, 2007, Bethesda, MD., pp: 160-163.
Plishker, W., O. Dandekar, S. Bhattacharyya and R. Shekhar, 2007b. Towards a heterogeneous medical image registration acceleration platform. Proceedings of the IEEE Biomedical Circuits and Systems Conference, November 27-30, 2007, Montreal, Que, pp: 231-234.
Pluim, J.P.W. and J.M. Fitzpatrick, 2003. Image registration. IEEE Trans. Med. Imag., 22: 1341-1343.

Puranik, M.S. and D.C. Gharpure, 2002. FPGA implementation of MFNN for image registration. Proceedings of the IEEE International Conference on Field-Programmable Technology, December 16-18, 2002, Hong Kong, China, pp: 364-367.
Sen, M., Y. Hemaraj, S.S. Bhattacharyya and R. Shekhar, 2006. Reconfigurable image registration on FPGA platforms. Proceedings of the IEEE Conference, BioCAS Biomedical Circuits and Systems, November 29-December-1, 2006, London, pp: 154-157.

Sen, M., Y. Hemaraj, W. Plishker, R. Shekhar and S.S. Bhattacharyya, 2008. Model-based mapping of reconfigurable image registration on FPGA platforms. J. Real-Time Image Proc., 3: 149-162.

Varnavas, A., T. Carrell and G. Penney, 2013. Increasing the automation of a $2 \mathrm{D}-3 \mathrm{D}$ registration system. IEEE Trans. Med. Imag., 32: 387-399.
White, B., 2008. Using FPGAs to perform embedded image registration. Ph.D. Thesis, University of Central Florida, Orlando, Florida.

