|
GCC Myths and Facts(关于 gcc 3 的一些说明,对编译优化很有用处)
by Joao Seabra, in Editorials - Saturday, February 15th 2003 00:00 PDT
Since my good old Pentium 166 days, I've liked to search for the best optimizations possible so programs can take the maximum advantage of hardware/CPU cycles. If I have a nice piece of hardware, why not run it at its full power, using every little feature? Shouldn't we all try to get the best results from the money invested in our machines?
Copyright notice: All reader-contributed material on freshmeat.net is the property and responsibility of its author; for reprint rights, please contact the author directly.
This article is written for the average desktop Linux user and with the x86 architecture and C/C++ in mind, but some of its content can be applied to all architectures and languages.
GCC 3 Improvements
GCC 3 is the biggest step forward since GCC 2 and represents more than ten years of work and two of hard development. It has major benefits over its predecessor, including:
Target Improvements
* A new x86 backend, generating much-improved code.
* Support for a generic i386-elf target.
* A new option to emit x86 assembly code using an Intel-style syntax.
* Better code generated for floating point-to-integer conversions, leading to better performance by many 3D applications.
Language Improvements
* A new C++ ABI. On the IA-64 platform, GCC is capable of interoperating with other IA-64 compilers.
* A significant reduction in the size of symbol and debugging information (thanks to the new ABI).
* A new C++ support library and many C++ bugfixes, vastly improving conformance to the ISO C++ standard.
* A new inliner for C++.
* A rewritten C preprocessor, integrated into the C, C++, and Objective C compilers, with many improvements, including ISO C99 support and improvements to dependency generation.
General Optimizations
* Infrastructure for profile-driven optimizations.
* Support for data prefetching.
* Support for SSE, SSE2, 3DNOW!, and MMX instructions.
* A basic block reordering pass.
* New tail call and sibling call elimination optimizations.
Why do some programmers and users fail to take advantage of these amazing new features? I admit that some of them are still "experimental", but not all of them. Perhaps the PGCC (Pentium compiler group) project gave rise to several misunderstandings which persist today. (PGCC offered several Pentium-specific optimizations. I looked at it when it first started, but benchmarks showed that the improvement was only about 2%-5% over GCC 2.7.2.3.)
We should clear the air about the GCC misconceptions. Let's start with the most loved and hated optimization: -Ox.
Myths
I use -O69 because it is faster than -O3.
This is wrong!
The highest optimization is -O3.
From the GCC 3.2.1 manual:
-O3 Optimize yet more. -O3 turns on all optimizations
specified by -O2 and also turns on the
-finline-functions and -frename-registers options.
The most skeptical can verify this in gcc/topolev.c:
/* Scan to see what optimization level has been specified.
That will determine the default value of many flags. */
-snip-
if (optimize >= 3)
{
flag_inline_functions = 1;
flag_rename_registers = 1;
}
If you are using GCC, there's no point in using anything higher than 3.
-O2 turns on loop unrolling.
In the GCC manpage, it's clearly written that:
-O2 turns on all optional optimizations except for loop unrolling [...]
Skeptics: check topolev.c.
So when you use -O2, which optimizations are you using?
The -O2 flag turns on the following flags:
* -O1, which turns on:
o defer pop (see -fno-defer-pop)
o -fthread-jumps
o -fdelayed-branch (on, but specific machines may handle it differently)
o -fomit-frame-pointer (only on if the machine can debug without a frame pointer; otherwise, you need to specify)
o guess-branch-prob (see -fno-guess-branch-prob)
o cprop-registers (see -fno-cprop-registers)
* -foptimize-sibling-calls
* -fcse-follow-jumps
* -fcse-skip-blocks
* -fgcse
* -fexpensive-optimizations
* -fstrength-reduce
* -frerun-cse-after-loop
* -frerun-loop-opt
* -fcaller-saves
* -flag_force_mem
* peephole2 (a machine-dependent option; see -fno-peephole2)
* -fschedule-insns (if supported by the target machine)
* -fregmove
* -fstrict-aliasing
* -fdelete-null-pointer-checks
* reorder blocks
There's no point in using -O2 -fstrength-reduce, etc., since O2 implies all this.
Facts
The truth about -O*
This leaves us with -O3, which is the same as -O2 and:
* -finline-functions
* -frename-registers
Inline-functions is useful in some cases (mainly with C++) because it lets you define the size of inlined functions (600 by default) with -finline-limit. Unfortunately, if you set a high number, at compile time you will probably get an error complaining about lack of memory. This option needs a huge amount of memory, takes more time to compile, and makes the binary big. Sometimes, you can see a profit, and sometimes, you can't.
Rename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. It can, however, make debugging impossible, since variables will no longer stay in a "home register". Since i386 is not a register-rich architecture, I don't think this will have much impact.
A higher -O does not always mean improved performance. -O3 increases the code size and may introduce cache penalties and become slower than -O2. However, -O2 is almost always faster than -O.
-march and -mcpu
With GCC 3, you can specify the type of processor you're using with -march or -mcpu. Although they seem the same, they're not, since one specifies the architecture, and other the CPU. The available options are:
* i386
* i486
* i586
* i686
* Pentium
* pentium-mmx
* pentiumpro
* pentium2
* pentium3
* pentium4
* k6
* k6-2
* k6-3
* athlon
* athlon-tbird
* athlon-4
* athlon-xp
* athlon-mp
-march implies -mcpu, so when you use -march, there's no need to use -mcpu.
-mcpu generates code tuned for the specified CPU, but it does not alter the ABI and the set of available instructions, so you can still run the resulting binary on other CPUs (it turns on flags like mmx/3dnow, etc.).
When you use -march, you generate code for the specified machine type, and the available instructions will be used, which means that you probably cannot run the binary on other machine types.
Conclusion
Fine-tune your Makefile, remove those redundant options, and take a look at the GCC manpage. I bet you will save yourself a lot of time. There's probably a bug somewhere that can be smashed by turning off some of GCC's default flags.
This article discusses only a few of GCC's features, but I won't broaden its scope. I just want to try to clarify some of the myths and misunderstandings. There's a lot left to say, but nothing that can't be found in the Fine Manual, HOWTOs, or around the Internet. If you have patience, a look at the GCC sources can be very rewarding.
When you're coding a program, you'll inevitably run into bugs. Occasionally, you'll find one that's GCC's fault. When you do, stop to think about the time and effort that's gone into the compiler project and all that it's given you. You might think twice before simply flaming GCC.
Interesting Links
* http://www.gnu.org/software/gcc/
* http://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/
* http://gcc.gnu.org/onlinedocs/gcc/Gcov-and-Optimization.html
* http://www.redhat.com/software/gnupro/technical/gnupro_gcc.html
* http://www.freshmeat.net/projects/prelink/
* http://www.tldp.org/HOWTO/GCC-HOWTO/index.html (last updated in May of 1999) |
|