|
发表于 2004-2-27 10:17:23
|
显示全部楼层
Hi Jeff,
Below is my log of porting fcitx to OSX... it helps.
==============================
XIM Chinese on X11 OSX done!!
==============================
Finally I got it work. The root of the problem lies in libX11, which on OSX was compiled with -DX_LOCALE by default. This flat is useful for those system where libc does not support locale well.
In libc, if X_LOCALE is defnied, it links set_locale function to __Xsetlocale and Xlib will encode/decode the text it transfers. On linux, X_LOCACLE is NOT compiied in by default, that's why these XIM works well on linux, but hard to work on osx.
The ultimate solution is to recompile the XFree86 from source. Remove all -DX_LOCALE in conf/cf/site.cf configuration file.
Rejoice... very happy. I made it.
==============================
Summary : XIM, X11, OSX
==============================
This is a simple summary on what I have learned and discovered regarding to XIM and X11 on OSX.
XmbLookupString in X11 lib reads the input from XIM and encodes it into current locale (UTF-8 for example). Then it returns to encoded string to X11-application, such as GTK2. In the case, XIM should pass the "raw" code to X-server, otherwise, the input-string will be encoded twice, as I have met so far.
The X11 server is started with X11.app in mac osx. Because this is a double-clickable application, it does not read the environmental variables from the shell configuration such as /etc/profile and ~/.bashrc. To set the environemental variable for one user session, modify .MacOSX/environment.plist file, which is in XML format.
Locale Related Environmental Variables
CHARSET : current charset, used by glib2 and gtk+2
LANGUAGE : locale used by GNU get_text library only
LANG : the default value for all LC_* variables
LC_ALL : override other LC_* variables
LC_* : define the locale for libc function setlocale and getlocale
==============================
More about XIM on OSX
==============================
I have made a big step forward on XIM for OSX. It is really fun debugging new problems.
First of all, data in computer is stored in bytes. The meaning of bytes is defined by encoding. UTF16 and UTF8 are l methods for encoding a string of characters as a sequence of bytes. 4f 60 is the unicode for chinese character "ni". e4 bd a0 is the UTF8 encoding for it, and encoded it again in UTF8 becomes c3a4 c2bd c2a0.
According to some mailing list, if your software directly connects with XIM, XmbLookupString() will give you input string in locale encoding. THerefore, it is correct that my Xterm receives a e4 bd a0 (not c3a4 c2bd c2a0) when I input chinese chararcter "ni" from IM. But now, I got a c3a4 c2bd c2a0, which is encoded in UTF-8 twice. More over, the string is encoded on a per byte based. It does not take ebbda0 as one character, but three and encoded them seperately.... weird.
Accordingly, in fcitx, we should expect 4f 60 for function SendHZtoClient, then client goes right. |
|