|
发表于 2006-4-19 19:40:47
|
显示全部楼层
Post by darkise
extern "C"
是要告诉编译器,编译出来的库要兼容C
https://secure.wikimedia.org/wikipedia/en/wiki/Name_mangling
Name mangling in C++
C++ compilers are the most widespread, and yet least standard, users of name mangling. The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembler directly, the system's linker generally did not support C++ symbols, and mangling was still required.
The C++ language does not define a standard decoration scheme, so each compiler uses its own. Combined with the fact that C++ decoration can become fairly complex (storing information about classes, default arguments, variable ownership, operator overloading, etc), this means that object code produced by different compilers is not usually linkable.
[edit]
Simple example
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart from the name. If they were naïvely translated into C with no changes, the result would be an error — C does not permit two functions with the same name. The compiler therefore will encode the type information in the symbol name, the result being something resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name mangling applies to all symbols.
[edit]
Complex example
For a more complex example, we'll consider an example of a real-world name mangling implementation: that used by GNU GCC 3.x, and how it mangles the following example class. The mangled symbol is shown below the respective identifier name.
namespace wikipedia {
class article {
public:
std::string format (void);
/* = _ZN9wikipedia7article6formatEv */
bool print_to (std:stream&);
/* = _ZN9wikipedia7article8print_toERSo */
class wikilink {
public:
wikilink (std::string const& name);
/* = _ZN9wikipedia7article8wikilinkC1ERKSs */
};
};
}
The name mangling scheme used here is relatively simple. All mangled symbols begin with _Z (note that an underscore followed by a capital is a reserved identifier in C and C++, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of <length,id> pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes
_ZN·9wikipedia·7article·6format·E
For functions, this is then followed by the type information; as format() is a void function, this is simply v; hence:
_ZN·9wikipedia·7article·6format·E·v
For print_to, a standard type std:stream (or more properly std::basic_ostream<char, char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:
_ZN·9wikipedia·7article·8print_to·E·RSo
[edit]
How different compilers mangle the same functions
There isn't a standard scheme by which even trivial C++ identifiers are mangled, and consequently different compiler vendors (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:
Compiler void h(int) void h(int, char) void h(void)
GNU GCC 3.x _Z1hi _Z1hic _Z1hv
GNU GCC 2.9x h__Fi h__Fic h__Fv
Intel C++ 8.0 for Linux _Z1hi _Z1hic _Z1hv
Microsoft VC++ v6/v7 ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
Borland C++ v3.1 @h$qi @h$qizc @h$qv
OpenVMS C++ V6.5 (ARM mode) H__XI H__XIC H__XV
OpenVMS C++ V6.5 (ANSI mode) CXX$__7H__FI0ARG51T CXX$__7H__FIC26CDH77 CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 CXX$_Z1HI2DSQ26A CXX$_Z1HIC2NP3LI4 CXX$_Z1HV0BCA19V
Digital Mars C++ ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
SunPro CC __1cBh6Fi_v_ __1cBh6Fic_v_ __1cBh6F_v_
HP aC++ A.05.55 IA-64 _Z1hi _Z1hic _Z1hv
HP aC++ A.03.45 PA-RISC h__Fi h__Fic h__Fv
Tru64 C++ V6.5 (ARM mode) h__Xi h__Xic h__Xv
Tru64 C++ V6.5 (ANSI mode) __7h__Fi __7h__Fic __7h__Fv
Notes:
* The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (ARM). With the advent of new features in standard C++, particularly templates, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identical mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backwards compatible. todo: the different isn't obvious from the examples. maybe a template or something should be added...
* On IA-64, a standard ABI exists (see external links), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x, in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
[edit]
Handling of C symbols when linking from C++
The job of the common C++ idiom:
#ifdef __cplusplus
extern "C" {
#endif
/* ... */
#ifdef __cplusplus
}
#endif
is to ensure that the symbols following are "unmangled" - that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.
For example, the standard strings library, <string.h> usually contains something resembling:
#ifdef __cplusplus
extern "C" {
#endif
void *memset (void *, int, size_t);
char *strcat (char *, const char *);
int strcmp (const char *, const char *);
char *strcpy (char *, const char *);
#ifdef __cplusplus
}
#endif
Thus, code such as:
if (strcmp(argv[1], "-x") == 0)
strcpy(a, argv[2]);
else
memset(a, 0, sizeof(a));
uses the correct, unmangled strcmp and memset. If the extern had not been used, the C++ compiler would produce code equivalent to:
if (__1cGstrcmp6Fpkc1_i_(argv[1], "-x") == 0)
__1cGstrcpy6Fpcpkc_0_(a, argv[2]);
else
__1cGmemset6FpviI_0_(a, 0, sizeof(a));
Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.
[edit]
Standardised name mangling in C++
While it is a relatively common belief that standardised name mangling in the C++ language would lead to greater interoperability between implementations, this is not really the case. Name mangling is only one of several ABI issues in a C++ implementation, and other language details like exception handling, virtual table layout, structure padding, etc. would render differing implementations yet incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g. length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.
The C++ standard therefore does not attempt to standardise name mangling. On the contary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages to use different mangling schemes to prevent linking when other aspects of the ABI, such as exception handling and virtual table layout, are incompatible.
[edit]
Real-world effects of C++ name mangling
As C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g. GNU GCC and the OS vendor's compiler) wished to install the Boost library, it would have to be compiled twice — once for the vendor compiler and once for GCC.
For this reason name decoration is an important aspect of any C++-related ABI |
|