Date: 01/28/2002 From: David Jeske Subject: DB performanceI just finished reading an article about how Intel's C/C++ compiler beats gcc. It reminded me of a statement I made long ago which is more true now than ever:
The software which generates binary instructions for a processor should always be written by the hardware designers
I came to this conclusion while working at 3Dfx. At the time, AMD had someone working on site to help us optimize Glide for AMD processors. He was an excellent fellow, and he really made Glide crank on AMD processors. While he was there, he spent a day giving all interested parties an "optimizing for AMD K6" seminar. I attended and learned more intracicies of the AMD processor than I would ever make use of.
At one point, he was explaining a special technique to optimize the placement of instructions in memory. In the AMD K6 processor core, several previously microcoded instructions were given fast-paths which sent them through as a single micro-op. However, due to some limitations of their instruction decoder, the instructions could only go through the fast path if they did not fall on a cacheline boundary.
That's a pretty technical description, but basically, if certain machine instructions were placed at certain memory locations, they ran slower than if they were placed at other memory locations. He suggested looking at the binary output of the assembler, and when the instructions were in the 'wrong place' to add nop instructions beforehand to push that instruction into a place where it could run fast.
At that point, I thought to myself about all the x86 code in the world which was being generated by compilers that didn't observe this rule; all the windows benchmarks which would be 2% slower on AMD processors, because of the lack of this single optimization. It was then that it dawned on me that we in the software industry have no business shipping optimized binaries to customers. It should be the processor designers themselves who write the rules about how instructions are generated for a particular CPU, or at least someone who is intimately familiar with the architecture. Since most software today is run on hardware that did not even exist when that software was compiled, we better find a way to stop shipping the final compiled binaries.
Today, almost three years later, people write articles, and are possibly even surprised, when Intel writes a compiler that beats other compilers. I answer this quandry by pointing out "of course Intel can write a better compiler, they designed the darn thing". The real question in my mind is why we don't have a popular software system in place so they can ALWAYS be the people generating the machine instructions.