| Massively Parallel Processors for DSP, Part 1 |
By Jennifer White & Jeff Bier, 7/3/2007
Though they aren’t covered in this article, FPGAs can be thought of as the most fine-grained multiprocessors, with gate-level programmability. A step up in granularity is seen in MathStar’s MOA2400D chip, which is based on the company’s “FPOA” (or “field-programmable object array”) architecture. This chip contains an array of 400 functional units, which include ALUs, multiply-accumulate (“MAC”) units, and register files. MathStar claims that its medium-grained approach provides much of the flexibility of FPGAs while offering a simpler programming approach and higher clock speeds (up to 1 GHz, vs. approximately 300-500 MHz for high-performance DSP-oriented FPGAs). Each functional unit is individually configured using SystemC code; functional units exchange data via a synchronous interconnect. Unlike in FPGAs, the clock speed of the chip doesn’t depend on the functionality being implemented.
At the other end of the granularity spectrum lies IBM’s Cell processor, which incorporates eight “synergistic processing elements” (“SPEs”), each of which is a complex, 32-bit superscalar processor with a high level of parallelism for accelerating DSP algorithms. These processors are controlled by the “POWER Processing Element” (PPE)—a separate 32-bit superscalar CPU hooked up to a cache. The PPE is responsible for running the operating system and coordinating the activities of the SPEs, which essentially act as co-processors. Cell was designed to function as a high-performance programmable processor for gaming applications; its top clock speed is about 3 GHz.
Massively parallel processors are always complex. The choice between a large number of simple processing elements and a small number of complex processors is, in effect, a choice between different types of complexity. Simple processing elements (like those found in MathStar’s FPOA) have a limited repertoire of capabilities, and so tend to be straightforward to use—at least, on a per-element basis. But it takes many of them working together to achieve high performance, and that’s where the complexity arises.
With fewer, more-complex processing elements, such as is found in the IBM Cell, partitioning the workload and coordinating the activities of the processing elements is less daunting (though it can still be quite challenging), but getting the most out of each processing element can be harder. Processor-based chips tend to use software development tools that are similar to those used for single processors; finer-granularity chips (such as FPGAs) often use very different toolchains and development paradigms.
|
|
|