INTRODUCTION OF INTERNET COMPUTING :
Current trends in high-performance computing (HPC) are increasingly moving towards heterogeneous platforms , i.e. systems made of different computational units, like general-purpose CPUs, digital signal processors (DSPs), graphics processing units (GPUs), co-processors, and custom acceleration logic, enabling signiﬁcant beneﬁts in terms of both power and performance.
WhileHPCcoverstodaydisparateapplications[27,7,6],historicallyithasnever extensively relied on FPGAs, mostly because of the reduced support for ﬂoatingpoint arithmetic. On the other hand, FPGAs and special-purpose hardwaregeneral,e.g.used for arithmetic operations requiring specialized circuit solutions in various areas [11, 10, 15, 14], provide a huge potential for improved power efﬁciency compared to software-programmable platforms.
Furthermore, while numerous approaches exist for raising somewhat the level of abstraction for hardware design [18, 16], developing an FPGA-based hardware accelerator is still challenging as seen from a software programmer. Consequently, high-performance platforms mostly rely on general-purpose compute units such as CPUs and/or GPUs.
However, pure general-purpose hardware is affected by inherently limited power-efﬁciency, i.e., low GFLOPS-per-Watt. Architectural customization can play here a key role, as it enables unprecedented levels of powerefﬁciency compared to CPUs/GPUs. This is the essential reason while very recent trends are putting more emphasis on the potential role of FPGAs.
Infact,recentFPGAfamilies,suchastheXilinxVirtex-7ortheAlteraStratix5, have innovative features, providing signiﬁcantly reduced power, high speed, lower cost,reconﬁgurability .Due to these changes ,in the very recen tyears many innovative companies, including Convey, Maxeler, SRC, Nimbix , have introduced FPGA-based heterogeneous platforms used in a large range of HPC applications, e.g. multimedia, bioinformatics, security-related processing, etc. [27, 25], with speedups in the range of 10x to 100x.
This paper explores the adoption of a deeply customizable scratchpad memory system for FPGA-oriented accelerator designs. At the heart of the proposed architectureisamulti-bankparallelaccessmemorysystemforGPU-likeprocessors.The proposed architecture enables a dynamic bank remapping hardware mechanism, allowing data to be redistributed across banks according to the speciﬁc access pattern of the kernel being executed, miminizing the number of conﬂicts and thereby improving the ultimate performance of the accelerated application.
In particular, relying on an advanced conﬁgurable crossbar, on a hardware-supported remapping mechanism, and extensive parameterization, the proposed architecture can enable highly parallel accesses matching the potentia lof currentHPC-oriented FPG Atechnologies.The paper describes the main insights behind by dynamic bank remapping as well as the key role that scratch pad memory might play for hardware-accelerated computing applications.