
Single precision floating point dot product benchmark results
The test is floating point input, floating point output, floating point
taps.  The tests are run with 256 taps, over 40e6 input samples.


4-18-2002

Athlon MP 1800+ (1.5 GHz) running uniprocessor:

    description		giga taps/sec	cycles/tap
    ===========		============	============
    unrolled C		0.847		1.77
    SSE simple		1.01		1.48
    SSE unrolled	1.07		1.40
    3DNow! simple	1.25		1.20
    3DNow! unrolled	1.4		1.07


Pentium 4 (1.7 GHz):

    description		giga taps/sec	cycles/tap
    ===========		============	============
    unrolled C		0.631		2.7
    SSE simple		1.28		1.32
    SSE unrolled	1.7		1.0

8-8-2002

Pentium 4 2.0GHz:

	Results scale linearly with clock speed, as
expected.  SSE unrolled gets 1 cycle per tap

Pentium III 600MHz (Coppermine)

     description        giga taps/sec   cycles/tap
     ===========        =============   ==========
     Standard C         0.218           2.75
     Unrolled SSE       0.355           1.7


The giga taps/sec column measures absolute performance.  Big is
better.

Cycles/tap is the processor clock speed divided by taps/sec.  
It is a normalized figure of merit.  Small is better.


To put these numbers in perspective, assume you've got a 56 tap
root-raised-cosine filter and a sample rate of, say 2 * 10.76e6 =
21.52e6 samples/sec.

This works out to 21.52e6 * 56 taps = 1.2 giga taps/sec.

# ----------------------------------------------------------------
# complex dot product (SCC)

Athlon MP 1800+ (1.5 GHz) running uniprocessor (21 Feb 2003)

[eb@hanbo examples]$ ./benchmark_dotprod_SCC
   generic: taps:  256  input: 4e+07  cpu: 63.660  taps/sec:  1.609e+08  
 3DNow!Ext: taps:  256  input: 4e+07  cpu: 18.230  taps/sec:  5.617e+08  
    3DNow!: taps:  256  input: 4e+07  cpu: 32.560  taps/sec:  3.145e+08  
       SSE: taps:  256  input: 4e+07  cpu: 32.000  taps/sec:    3.2e+08  


Pentium 4 (1.7 GHz)  (21 Feb 2003)

[eb@grinder examples]$ ./benchmark_dotprod_SCC
   generic: taps:  256  input: 4e+07  cpu: 94.620  taps/sec:  1.082e+08  
       SSE: taps:  256  input: 4e+07  cpu: 45.050  taps/sec:  2.273e+08  

# ----------------------------------------------------------------
# complex dot product (FCC)

Athlon MP 1800+ (1.5 GHz) running uniprocessor (21 Feb 2003)

[eb@hanbo examples]$ ./benchmark_dotprod_FCC
   generic: taps:  256  input: 4e+07  cpu: 61.800  taps/sec:  1.657e+08  


Pentium 4 (1.7 GHz)  (21 Feb 2003)

[eb@grinder examples]$ ./benchmark_dotprod_FCC
   generic: taps:  256  input: 4e+07  cpu: 77.010  taps/sec:   1.33e+08


# ----------------------------------------------------------------
# floating point dot product (FFF)

Athlon MP 1800+ (1.5 GHz) running uniprocessor (21 Feb 2003)

[eb@hanbo examples]$ ./benchmark_dotprod
   generic: taps:  256  input: 4e+07  cpu: 14.880  taps/sec:  6.882e+08  
    3DNow!: taps:  256  input: 4e+07  cpu:  7.390  taps/sec:  1.386e+09  
       SSE: taps:  256  input: 4e+07  cpu:  9.670  taps/sec:  1.059e+09  


Pentium 4 (1.7 GHz)  (21 Feb 2003)

[eb@grinder examples]$ ./benchmark_dotprod
   generic: taps:  256  input: 4e+07  cpu: 16.310  taps/sec:  6.278e+08  
       SSE: taps:  256  input: 4e+07  cpu:  5.990  taps/sec:   1.71e+09  
