By BDTI, 12/6/2007
To view this article as a pdf click here.
In 2004, ARM announced its newest generation of licensable cores, called the “Cortex” family. Cortex cores span a wide range of performance levels, with Cortex M-series cores at the low end, Cortex R-series cores providing mid-range performance, and the Cortex A-series applications processors offering the highest performance. The first Cortex core to be announced was the Cortex-M3, and since then ARM has announced several others, including the Cortex-A8 and A9, the Cortex-M1, and the Cortex-R4.
The Cortex-R4 targets moderately demanding applications such as hard disk drives, inkjet printers, automotive safety systems, and wireless modems. It is marketed as a higher-performance replacement for the older ARM9E core. BDTI recently completed a benchmark analysis of the ARM Cortex-R4 core and is now releasing the first independent signal processing benchmark results for this processor. In this article, we’ll take a look at its benchmark results and compare its performance to that of other ARM cores (including the ARM11, another moderate-performance core) and selected competitors.
Table 1 summarizes key attributes of selected ARM processor cores.
|
|
ARM9E |
ARM11 |
Cortex-R4 |
Cortex-A8 w/NEON* |
|
Typical clock rate* |
265 MHz
(130 nm) |
335 MHz
(130 nm) |
375 MHz
(90 nm) |
450 MHz–1100 MHz
(65 nm) |
|
Instruction sets |
ARMv5E,
Thumb |
ARMv6,
Thumb, Thumb2 |
ARMv7,
Thumb, Thumb2 |
ARMv7,
Thumb, Thumb2, NEON |
|
Issue width |
Single issue |
Single issue |
Dual issue (superscalar) |
Dual issue (superscalar) |
|
Pipeline stages |
5 |
8 |
8 |
13 + 10 (NEON) |
|
DSP/media instructions |
Minor |
Minor |
Minor |
Extensive (NEON) |
|
Per-cycle multiply-accumulate throughput (fixed-point) |
1 × 32-bit
1 × 16-bit |
1 × 32-bit
2 × 16-bit |
1 × 32-bit
2 × 16-bit
|
2 × 32-bit
4 × 16-bit
8 × 8-bit
Float: 2 × 32-bit |
|
Data bus |
32-bit |
64-bit |
64-bit |
64-/128-bit |
|
Branch prediction |
No |
Yes |
Yes |
Yes |
Table 1. Characteristics of selected ARM cores.
*Clock speed data provided by ARM, not verified by BDTI. Clock speeds for ARM9E and ARM11 are worst-case speeds in a TSMC CL013G process and ARM Artisan SAGE-X library. Clock speed for Cortex-R4 is worst-case for a 90 nm CLN90G Artisan Advantage implementation. High-end clock speed for Cortex-A8 is based on a custom implementation.
As shown in Table 1, the Cortex-R4 is a superscalar core that can issue and execute up to two instructions per cycle. Like the Cortex-A8, it supports the ARMv7 instruction set architecture and the Thumb2 compressed instruction set, but the Cortex-R4 does not support the NEON signal processing extensions. As a result, its signal processing capabilities and features are much more limited than those of the Cortex-A8.
|