You are here:  Articles


 
 
 
INSIDE DSP ARTICLES  

Current Articles | Categories | Search

Evaluating the DSP Capabilities of the Cortex-R4
By Jennifer White & Jeff Bier, 12/6/2007

The Cortex-R4 as a Signal Processing Engine

The Cortex-R4 targets applications that include moderate signal processing requirements, and the core includes hardware and instructions to help improve its performance on this type of processing.   For example, the Cortex-R4 supports SIMD (single instruction, multiple data) instructions that enable it to perform two 16-bit multiply-accumulate operations (MACs) per cycle; MAC operations are heavily used in many common signal processing algorithms, such as filters and FFTs.

To assess the Cortex-R4’s signal processing capabilities and compare its performance to that of other processors, BDTI benchmarked the Cortex-R4 using the BDTI DSP Kernel Benchmarks, a suite of 12 key DSP algorithms such as FIR filters, FFTs, and a Viterbi decoder. These benchmarks are hand-optimized for each processor, typically in assembly language, and verified by BDTI.  The BDTI DSP Kernel benchmarks have been implemented on a wide variety of processor cores and chips, providing a range of comparison data for evaluating new processors.

BDTI uses processors’ results on the DSP Kernel Benchmarks to generate an overall signal processing speed metric, the BDTImark2000. (When the benchmark performance is verified using a simulator rather than hardware, this metric is called the BDTIsimMark2000.) The BDTImark2000 metric combines the number of cycles required to execute each benchmark with the processor’s instruction cycle rate (i.e., its clock speed) to determine the amount of time the processor requires to execute the benchmarks. For off-the-shelf chips, we use the fastest clock speed at which the chip is currently shipping. For licensable cores, the clock speed depends on how the core is fabricated. To enable apples-to-apples comparisons, BDTI typically uses clock speeds for their cores fabbed in a TSCM 130 nm process, under worst-case conditions.  ARM has not reported this data for all of its cores, so BDTI has used alternate clock speeds in some cases, as noted in the table above.

In Figure 1, we present BDTIsimMark2000 cores for selected ARM cores, alongside BDTImark2000 scores for two off-the-shelf DSP processor chips for comparison.

r4_figure1.gif

* TSMC CL013G, Artisan SAGE-X, worst-case conditions
** 90 nm, non-BDTI conditions, clock speed not BDTI certified 
*** Estimated clock speed for Cortex-A8 in TI OMAP3430 implementation, clock speed not BDTI certified
Figure 1. BDTImark2000 scores for selected cores and chips. The BDTImark2000 is a composite DSP speed metric based on processors’ results on the BDTI DSP Kernel Benchmarks. A higher score indicates a faster processor. ARM has not provided clock speeds for the Cortex-R4 and Cortex-A8 that conform to BDTI’s uniform conditions for cores; therefore, the results for these two cores should not be compared to results for non-ARM cores

As shown in Figure 1, the Cortex-R4 and ARM11 have similar signal processing performance. (For a full analysis of the ARM11’s signal processing performance, see “Can the ARM11 Handle DSP?”) The Cortex-R4 is not intended to replace the ARM11; rather, ARM positions the Cortex-R4 as a higher-performance replacement for the ARM9E. Compared to that processor, the Cortex-R4 is nearly three times as fast.  Some of the speed increase is due to the Cortex-R4’s more powerful architecture (we’ll discuss this more later), and some is due to its faster clock speed.

At the clock speeds shown above, the Cortex-R4’s signal processing speed is similar to that of the Texas Instruments TMS320C55x, a widely used, mid-range DSP chip. At this level of performance, the Cortex-R4 may be able to subsume the processing typically allocated to a low-cost DSP processor. At 450 MHz, the Cortex-A8 with NEON signal processing extensions is more than twice as fast as the 375 MHz Cortex-R4. (The 450 MHz clock speed used here to calculate benchmark results for the Cortex-A8 is the estimated speed of the core as fabricated in Texas Instruments’ OMAP3410 chip.)

From the data presented in Figure 1, it’s clear the clock rate accounts for only part of the signal processing speed differences among processors.  The other factor is the processors’ architectural “power”—that is, how much work each processor can accomplish in each clock cycle.  In the next section, we’ll look at some of the architectural differences that contribute to the performance numbers shown above.

Previous Page | Next Page
 
 
Mindseye
DSPDesignLine
  
HomeAbout Inside DSPArticlesSearch ArticlesArchivesResourcesContact UsSubscribe to Inside DSPAdvertise with Inside DSP
Copyright 2006-2008 by BDTI  |  Terms Of Use  |  Privacy Statement
  |