Professor Mike Flynn, Chairman of Maxeler, will deliver a talk on FPGA accelerated processing, titled “Accelerating computations in very large applications using data flow based accelerators” at a Computational Science Research Center colloquium, San Diego, Feb 11, 2011.
For many high performance computing applications the alternative to the multicore rack is to use an accelerator to assist each multi-core node. There are a number of instances of these accelerators: GPGPU, specialized processors and FPGAs.
At Maxeler we’ve found that FPGA array technology wins out on performance for most relevant applications. Given the initial area-time-power disadvantage of the FPGA compared to (say) a custom designed adder, this is a surprising result. The sheer magnitude of the available FPGA parallelism overcomes the initial disadvantage.
For very large (more than 1020 operations per run or continuously running) applications, we first identify the locus of dynamic activity (loosely termed the “kernels”). This is assigned to the accelerator. Next, where possible, the relevant program is configured as a streaming computation, with a static instruction graph activated by data streams. Using the FPGA technology it’s possible to configure this data-flow graph into a synchronous data-flow machine to execute the computation. The array is synchronized to accept a new set of input arguments each cycle, spanning a pipeline of up to 500 stages.
As an example we consider modeling problems in geophysics. In a typical problem we realize a 2000 node array on 2 FPGAs, with a resulting 50-100 times speedup over a conventional multi-core server.