By BDTI, 11/7/2005
Last week processor core licensor ARC
introduced a multimedia subsystem for its ARC 700 family of
configurable cores. This multimedia subsystem extends the ARC 700 CPU
with new instructions, a powerful SIMD engine, memory, a DMA
controller, and video decoding software. The SIMD engine is the most
notable feature of this subsystem. This engine features a 128-bit-wide
data path that can perform up to sixteen 8-bit operations, eight 16-bit
operations or four 32-bit operations per cycle. In comparison, ARC’s
existing DSP extensions can perform a maximum of one 32-bit or two
16-bit operations per cycle.
The basic features of ARC’s SIMD engine are remarkably similar to those of ARM’s NEON extensions. (For more on NEON, see the October 2004 edition of Inside DSP.)
For example, the ARM NEON extensions can also perform up to sixteen
8-bit operations, eight 16-bit operations or four 32-bit operations per
cycle.
However, the ARC SIMD engine differs from
NEON in two respects. First, the ARC SIMD engine offers two operating
modes. In the first mode, the SIMD engine draws instructions and data
from the ARC 700 pipeline. This mode resembles the operation of the ARM
NEON extensions. In the second mode, the SIMD engine operates from its
own private instruction and data memories. This mode allows the SIMD
engine to operate in parallel with the CPU. ARM does not currently
offer similar functionality.
The ARC SIMD engine is also more specialized than the NEON extensions.
For example, the ARC SIMD engine offers specialized instructions to
accelerate the deblocking filters in H.264 and VC-1 video codecs. The
ARM NEON extensions do not offer similarly specialized instructions. As
with the specialized instructions found in DSPs, It’s likely that ARC’s
specialized instructions will boost performance at the cost of
increased programming complexity.
In addition to the SIMD extensions, ARC’s multimedia subsystem adds
entropy decoding instructions to the ARC 700 CPU, which are useful for
the variable length decoding tasks found in video decoders. The
multimedia subsystem also includes a new DMA engine. Among other
capabilities, the DMA engine can load data from external RAM directly
into the SIMD engine’s private data memory.
According to ARC, the multimedia subsystem is capable of operating at
the same clock speed as the ARC 700 cores. For example, ARC says the
subsystem can achieve a worst-case speed of 533 MHz when implemented in
TMSC’s 0.13-micron LVLK-OD process.
ARC expects most licensees to use the multimedia subsystems for
lower-speed designs. For example, ARC projects that a multimedia
subsystem running at 166 MHz will be able to decode H.264 baseline
profile video at D1 resolution and 30 frames per second. (ARC has not
yet completed development of this decoder, so this claim has not yet
been proven.) According to ARC, the multimedia subsystem can achieve
this 166 MHz clock speed in the low-cost TSMC CL013G process. In this
same process, the area for the subsystem is 2.36 mm² (including an ARC
725D core, but not any memory).
If ARC’s performance figures are correct, it has created a solution
that combines efficiency and flexibility. In terms of performance and
die area, the ARC multimedia subsystem appears to be competitive with
highly specialized video decoding engines. Yet ARC’s multimedia
subsystem—particularly the SIMD engine—is relatively flexible and
useful for tasks other than video decoding. Combining efficiency and
flexibility is no easy feat, so ARC’s claims are likely to be greeted
with skepticism. Before it can win customers, ARC will need to provide
strong evidence to support its bold claims.
ARC expects to make the multimedia subsystem available to customers early next year.
|