By BDTI, 1/3/2007
Compiling digital signal processing application code is not a push-button process—at least, not unless you're willing to settle for inefficient code. Signal processing algorithms (and the processors commonly used to run them) have specialized characteristics, and compilers usually can't generate efficient code for them without some level of programmer intervention.
Learning how to coax efficient signal processing object code out of a compiler is an important skill, and can reduce (or eliminate) the amount of time you'll spend optimizing at the assembly level. In this article, we'll explain how to get the best performance out of whatever compiler you're using, and how to avoid getting blindsided by common compiler pitfalls.
Learning by disassembling
A useful tool for understanding compilers' strengths and weaknesses is the disassembler. This tool takes object code and generates the corresponding sequence of assembly language instructions, allowing you to see exactly how the compiler implemented your code. You'll be able to tell whether it did a good job of using specialized processor features and parallelism, and whether the resulting code looks more or less as you expected.
You'll often find some surprising results; it's not uncommon to find compilers generating incorrect code, or overlooking seemingly obvious optimizations. In some cases, assemblers even alter hand-coded assembly files, which you may not realize unless you use a disassembler to view the final code. This can happen if, for example, you unknowingly use a pseudo-instruction that expands into a sequence of multiple native instructions.
DSP processors (and many general-purpose processors) have specialized hardware or instructions to speed up common signal processing algorithms (such as filters and FFTs). These include, for example, single-cycle multiply-accumulates (MACs), specialized addressing modes (such as modulo and bit-reversed addressing), zero-overhead loops, and saturation.
If you're compiling signal processing application code, you'll need to figure out which (if any) of these instructions and hardware the compiler is capable of using, and under what circumstances. This will allow you to write your C code in a way that helps the compiler recognize opportunities to use specialized hardware features.
You can experiment with the C code and use the disassembler to observe the effect on the compiler's ability to create efficient object code. Each compiler has its own quirks, and it's worth the effort to spend some time learning how to help it do a good job.
Be careful with data types
If you're defining data types in C (rather than in assembly) it's important to understand how the compiler will implement them on your target processor, because this can have a significant effect on the efficiency of the compiled code.
The C standard defines several data types—but the sizes of these types are not standardized and differ from processor to processor. From a code performance perspective, the key thing to understand is that the size used by the compiler won't necessarily provide a good fit for the native data word width of the processor. For this reason, if you use the wrong data type in C, you may incur a huge penalty in the compiled code. If your processor only supports 16-bit integers, for example, you don't want to define data in your inner loop as 64-bit double.
The C data types are as follows:
- int is the primary data type for indexing and counting.
- long provides at least 32 bits (that's mandated by the C standard). On most processors where the native word is not 32 bits, "long" arithmetic requires library support.
- long long provides at least 64 bits. This format is not supported on most 16-bit processors.
- short is 16 bits on many processors, but not all.
- char is the smallest addressable unit. Many C programs assume that a char is 8 bits, which can be problematic because on a DSP processor, it is usually not 8 bits. Also, note that the SIZEOF() operator in C returns size in units of char, which, again, may not be 8 bits.
Table 1 and Table 2 show int and char sizes for a selection of DSP processors.
|
char size |
Processor |
|
8 |
ADI Blackfin |
|
16 |
ADI '21xx, TI 'C54, 'C55 |
|
24 |
Freescale 56x |
|
32 |
ADI Blackfin, TI 'C6x |
|
32 |
ADI SHARC, TigerSHARC |
Table 1. char data type sizes on common DSP processors.
| int size |
Processor |
|
16 |
ADI '21xx, TI 'C54, C55 |
|
24 |
Freescale 56x |
|
32 |
ADI Blackfin, TI 'C6x |
|
32 |
ADI SHARC, TigerSHARC |
Table 2. int data type sizes on common DSP processors.
To further complicate matters, signal processing code that's implemented on fixed-point processors typically relies heavily on fractional data types, such as Q.15 in which a 16-bit word represents a fractional value that lies between -1 and 1. DSP processors are designed for efficient operations on fractional data—but ANSI C doesn't recognize fractional types. If you stick with ANSI C, you're likely to use integer data types and shifts to implement fractional arithmetic. But when the compiler encounters this, the resulting code can be extremely inefficient. To address this issue, many DSP processor compilers support fractional data types via C-language extensions (discussed further below).
Signal processing algorithms are often initially developed using floating-point data types, and then ported to fixed-point processors. If you specify a floating-point data type and the target processor doesn't natively support floating-point operations (as is true of most DSP processors), then the compiled code will emulate floating-point math in software—which is extremely slow.