| OpenIMPACT | Current Status | Software Releases | FAQ

zlib optimized with OpenIMPACT



These files serve as drop-in replacements for libz.a on a Linux IA-64 box. They have been built with OpenIMPACT, an optimizing C compiler. (OpenIMPACT has not yet been released.)

Please report problems or results back to the Gelato mailing list, or directly to the maintainer (cernekee at crhc dot uiuc dot edu).

zlib-1.1.4.tar.gz Original zlib 1.1.4 source code (official release).
gzio.patch Patch for the buffer overflow in gzio.c.
libz-nocspec.a zlib without control speculation - will work on all kernels, with or without the general speculation patch.
libz-cspec.a zlib with control speculation. This library requires a kernel built with the general speculation patch.

Benchmarking and Compilation:

gcc (baseline) - Compiled with gcc-2.96 -O3
eccprof - Compiled with Intel's ecc-7.0 -O3 and profile-guided optimization
oicc_nocspec - Compiled with oicc -O3 --no-control-speculation and profile-guided optimization
oicc_cspec - Compiled with oicc -O3 and profile-guided optimization
The CVS version of OpenIMPACT (oicc) from 03/01/2003 was used to build these files.

The machine used was an unloaded 900MHz zx2000 (Itanium II, 4 GB RAM) running kernel 2.4.20 with the control speculation patch.

The inputs used for benchmarks and regression tests were 3-6 megabyte files containing code, ASCII text, zeroes, and random numbers.

Performance:

minigzip benchmark results (graph)
 gcc
(baseline)
eccprof oicc (spec) oicc (nospec) Debian testing stock
ia32binaries 1.00x 1.39x 1.73x 1.59x 1.00x
ia32libs 1.00x 1.42x 1.80x 1.63x 1.00x
ia64binaries 1.00x 1.37x 1.69x 1.54x 1.00x
ia64libs 1.00x 1.38x 1.66x 1.54x 1.00x
shakespeare 1.00x 1.38x 1.69x 1.60x 1.00x
urandom 1.00x 1.30x 1.47x 1.37x 1.00x
zero 1.00x 1.13x 1.13x 1.04x 1.00x

Notes:
ia32libs and ia32binaries: oicc's performance advantages on these tests is largely due to reduced NOPs per cycle. oicc's ability to distinguish between sequential and independent operations allows it to schedule independent operations in parallel more often than compilers that do less intensive analysis of data flow.
zero performance was mostly due to reduced NOPs and slightly better success at branch prediction. There is some room for improvement here, though, as gcc's binary produced fewer data cache related stalls.

Example:

# untar files

tar zxvf zlib-1.1.4
tar zxvf gel-chatr-0.0.tgz
cp libz-*.a zlib-1.1.4/
cd zlib-1.1.4

# build gcc -O3 version

make CFLAGS="-DHAVE_UNISTD_H -DUSE_MMAP -O3" minigzip
mv minigzip minigzip-gcc

# build optimized versions

gcc minigzip.o -o minigzip-cspec -L. -lz-cspec
gcc minigzip.o -o minigzip-nocspec -L. -lz-nocspec
../chatr/chatr -r minigzip-cspec

# dry run to copy the benchmark data into the buffer cache

./minigzip-gcc < /tmp/linux-2.4.20/vmlinux > /dev/null

# measure execution time

time ./minigzip-gcc < /tmp/linux-2.4.20/vmlinux > /dev/null
time ./minigzip-nocspec < /tmp/linux-2.4.20/vmlinux > /dev/null
time ./minigzip-cspec < /tmp/linux-2.4.20/vmlinux > /dev/null

References

zlib home page
GNU GCC IA-64 project page