LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 1978|回复: 16

porting fcitx to OS X: 输入的字符乱码

[复制链接]
发表于 2004-2-27 08:16:08 | 显示全部楼层 |阅读模式
我尝试将 fcitx 移植到 Mac OS X 的 X11 下,目前我已经可以在 OS X 的 X11 下启动 fcitx 及进行汉字输入:





但是汉字输入到 X11 的应用程序里面却是乱码。

我检查 fcitx 的 SendHZToClient 函数里面在转换成 utf-8 以后的编码是正确的,比如说“人”字,在 SendHZToClient 中已经正确转换为 utf-8 编码:4A BA BA。但是,我的在 uxterm 里面运行的程序接收到的却是 C3 A4 C2 BA C2 BA。

下一步我不是很清楚应该从什么地方进行检查,请大家给我出出主意。谢谢!
发表于 2004-2-27 10:17:23 | 显示全部楼层
Hi Jeff,

Below is my log of porting fcitx to OSX... it helps.

==============================
XIM Chinese on X11 OSX done!!
==============================
Finally I got it work. The root of the problem lies in libX11, which on OSX was compiled with -DX_LOCALE by default. This flat is useful for those system where libc does not support locale well.

In libc, if X_LOCALE is defnied, it links set_locale function to __Xsetlocale and Xlib will encode/decode the text it transfers. On linux, X_LOCACLE is NOT compiied in by default, that's why these XIM works well on linux, but hard to work on osx.

The ultimate solution is to recompile the XFree86 from source. Remove all -DX_LOCALE in conf/cf/site.cf configuration file.

Rejoice... very happy. I made it.

==============================
Summary : XIM, X11, OSX
==============================
This is a simple summary on what I have learned and discovered regarding to XIM and X11 on OSX.

XmbLookupString in X11 lib reads the input from XIM and encodes it into current locale (UTF-8 for example). Then it returns to encoded string to X11-application, such as GTK2. In the case, XIM should pass the "raw" code to X-server, otherwise, the input-string will be encoded twice, as I have met so far.

The X11 server is started with X11.app in mac osx. Because this is a double-clickable application, it does not read the environmental variables from the shell configuration such as /etc/profile and ~/.bashrc. To set the environemental variable for one user session, modify .MacOSX/environment.plist file, which is in XML format.

Locale Related Environmental Variables

CHARSET : current charset, used by glib2 and gtk+2
LANGUAGE : locale used by GNU get_text library only
LANG : the default value for all LC_* variables
LC_ALL : override other LC_* variables
LC_* : define the locale for libc function setlocale and getlocale

==============================
More about XIM on OSX
==============================
I have made a big step forward on XIM for OSX. It is really fun debugging new problems.

First of all, data in computer is stored in bytes. The meaning of bytes is defined by encoding. UTF16 and UTF8 are l methods for encoding a string of characters as a sequence of bytes. 4f 60 is the unicode for chinese character "ni". e4 bd a0 is the UTF8 encoding for it, and encoded it again in UTF8 becomes c3a4 c2bd c2a0.

According to some mailing list, if your software directly connects with XIM, XmbLookupString() will give you input string in locale encoding. THerefore, it is correct that my Xterm receives a e4 bd a0 (not c3a4 c2bd c2a0) when I input chinese chararcter "ni" from IM. But now, I got a c3a4 c2bd c2a0, which is encoded in UTF-8 twice. More over, the string is encoded on a per byte based. It does not take ebbda0 as one character, but three and encoded them seperately.... weird.

Accordingly, in fcitx, we should expect 4f 60 for function SendHZtoClient, then client goes right.
 楼主| 发表于 2004-2-27 10:47:42 | 显示全部楼层
Thanks! Chen.

Yes, it's *e4*, not *4e*. I should have had my glasses fixed. :-(

So, I will try to pass sendHZToClient a UTF-16 BE string. Hope it works. Then I needn't  bother to rebuild X11.

I will update my progress as soon as possible.

Thanks. I do appreciate your help.

Jeff
发表于 2004-2-27 10:54:32 | 显示全部楼层
Building X11 again takes only one hour on my Tibook.
发表于 2004-2-27 11:17:39 | 显示全部楼层
这个我是完全不懂了……
 楼主| 发表于 2004-2-27 12:41:06 | 显示全部楼层
最初由 Yuking 发表
这个我是完全不懂了……


我发现问题还不是那么简单,因为XmbTextListToTextProperty 是以 '\0' 结尾的字符串为参数的,显然不可能是 utf-16。我想改用 XwcTextListToProperty,但是,我不是很清楚具体应该选什么 style。

sendHZToClient 的最后这段代码我不是太明白:

XmbTextListToTextProperty (display, (char **) &ps, 1, XCompoundTextStyle, &tp);
((IMCommitStruct *) call_data)->flag |= XimLookupChars;
((IMCommitStruct *) call_data)->commit_string = (char *) tp.value;
IMCommitString (ims, (XPointer) call_data);

&ps 是转换好的 utf-8 字符串。那 commit_string 设成的 tp.value 应该期望是一个什么值呢?

YuKing 可以指点一下吗?
 楼主| 发表于 2004-2-27 12:45:16 | 显示全部楼层
哦,顺便说一下我的打算:

因为我希望最终能把它作为 Fink 的 Port (类似 Debian 包),所以如果这个包的依赖关系是一个新的定制的 X11,那么操作上会是很困难的。而且也容易与其他项目冲突。

所以,暂时我研究的前提条件是不能重新编译 X11。
 楼主| 发表于 2004-2-28 01:44:01 | 显示全部楼层
Hi,

补充一下我的一些发现。关于 OS X 下 Locale 的问题。正如 puzzlebird 指出的,OS X 的 X11 是使用 -DX_LOCALE 这个参数编译的,这使得 X 具有有别于系统的自己的 Locale 设置。

一些关于 gtk+ 的编译的资料指出,在使用 -DX_LOCALE 的 Xlib 下编译 gtk+,必须也使用 -DX_LOCALE 参数。其目的正如 puzzlebird 指出的,因为它需要使用 _Xsetlocale 而不是 setlocale。

在 XLocale.h 里面是这样定义的:

#ifdef X_LOCALE
#define setlocale _Xsetlocale
#endif

所以,如果我们指出了 -DX_LOCALE 参数,那么我们的应用程序应该是可以调用正确 Locale 设置的。

但是,即使这样,理论上来讲,只要我用 -DX_LOCALE 来编译 fcitx ,那么 fcitx 应该也是可以工作的。就象 gtk+ 一样。因为,我的程序设定了使用 zh_CN.UTF-8,而我的确返回一个指定是 UTF-8 的字符串,系统没有理由会再做转换。

但是,事实的情况却不是这样。

后来,我参考 xcin 的介绍,写了这样的一个程序:

#include <stdio.h>
#include <langinfo.h>
#include <X11/Xlib.h>
#include <X11/Xlocale.h>

int main(void)
{
        printf("Set LC_CTYPE = %s\n", setlocale(LC_CTYPE,""));
        printf("CODESET = %s\n", nl_langinfo(CODESET));
        
        return 0;
}

然后我用两种不同的编译选项进行编译:

gcc -o checkLocale_no_X_LOCALE -L/usr/X11R6/lib -lX11 checkLocale.c

gcc -o checkLocale_X_LOCALE -DX_LOCALE -L/usr/X11R6/lib -lX11 checkLocale.c

然后,我 export LC_CTYPE=zh_CN.UTF-8

运行 checkLocale_no_X_LOCALE 的结果是:

Set LC_CTYPE = zh_CN.UTF-8
CODESET = UTF-8

很正常。

但是运行 checkLocale_X_LOCALE 的结果却是:

Set LC_CTYPE = zh_CN.UTF-8
CODESET = US-ASCII

CODESET 并没有设置为 UTF-8!

感觉是 OS X 下的 X11 的 Locale 配置有问题。

然后我检查了一下,/usr/X11R6/lib/X11/locale,里面没有 zh_CN.UTF-8 目录。

问题会发生在这里吗?
发表于 2004-3-1 08:24:44 | 显示全部楼层
最初由 jeff_yecn 发表
我发现问题还不是那么简单,因为XmbTextListToTextProperty 是以 '\0' 结尾的字符串为参数的,显然不可能是 utf-16。我想改用 XwcTextListToProperty,但是,我不是很清楚具体应该选什么 style。

sendHZToClient 的最后这段代码我不是太明白:

XmbTextListToTextProperty (display, (char **) &ps, 1, XCompoundTextStyle, &tp);
((IMCommitStruct *) call_data)->flag |= XimLookupChars;
((IMCommitStruct *) call_data)->commit_string = (char *) tp.value;
IMCommitString (ims, (XPointer) call_data);

&ps 是转换好的 utf-8 字符串。那 commit_string 设成的 tp.value 应该期望是一个什么值呢?

YuKing 可以指点一下吗?

这个问题我也说不清楚,请查一下X中有关XIM的文档吧。不好意思啊
发表于 2004-3-1 17:04:03 | 显示全部楼层
要是能做年做个for osx的版本就爽了。
您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表