In most embedded applications, system developers care about
high-level system attributes such as low cost, long battery life, and
high throughput. System developers generally don’t care how a solution delivers these top-level attributes; they only care about how well
it does so. Therefore, benchmarks should be built from the top down,
based on application requirements—and not from the bottom up, based on
preconceived notions of what sorts of hardware will be used in the
application.
More generally, benchmark designers should avoid making assumptions
about the hardware whenever possible. In many embedded
applications—such as the signal processing applications my company
focuses on—system designers have great latitude to select among very
different kinds of processing engines. A benchmark designed with one of
these classes of hardware in mind is unlikely to give valid results for
the other classes of hardware. Therefore, a benchmark that makes
unnecessary assumptions about the hardware will have limited utility.
In fact, the need for flexibility extends beyond accommodating all
of the hardware options. To be truly relevant, a benchmark must also
accommodate the full range of implementation techniques used in the
application. For example, developers of signal processing applications
almost never build an entire application with plain C code. Even the
best compilers sometimes produce very inefficient code, and when they
do, a skilled programmer can often make vast improvements with modest
effort—perhaps by modifying the C code, or perhaps by replacing
portions of it with assembly code. As a result, a benchmarking approach
that relies on plain C code is unlikely to produce useful performance
data for signal processing applications.
Another problem with using C—or any other programming language, for
that matter—is that this approach isn’t appropriate for solutions like
FPGAs, which do not run “software” in the traditional sense. Of course,
there are many applications where the best approach is to use an FPGA
or other solution that doesn’t rely on software. Hence, benchmarks
should avoid narrowly specifying the implementation methodology to
avoid excluding relevant hardware options.
In summary, setting out to create benchmarks for a specific
implementation approach—such as using multi-threaded processors or
using plain C code—is going about things the wrong way. Instead,
benchmarks should model the application requirements, and leave the
implementation approach (multiprocessing, multithreading,
reconfigurable hardware, hardwired solutions, or what have you) to the
benchmark implementer—just as actual applications allow for many
different implementation approaches.
Benchmark developers who find themselves designing benchmarks to
show off particular features of particular kinds of hardware should ask
themselves whether the results are really going to be meaningful. And
system designers need to understand the design of a benchmark before
accepting and using the results that the benchmark produces. If a
benchmark assumes a specific hardware feature or a specific
implementation methodology, system designers should proceed with
caution.