I encountered the first roadblock when porting. Apparently the wide character type (wchar_t) is 4 bytes instead of 2 like on Windows. Do these crazy Linux ppl really think there’s a language with 2.1 billion characters?
Written by MM. Posted at 10:45 am on October 18th, 2010
8 comments.
Post a comment.
OFFTOPIC
MM I’m waiting for youre article(s) about meditation!
The problem is that you need more than 65535 characters (2 bytes) to render all languages, especially for the Asian ones. See Unicode planes.
I recently had the inverse problem when porting from linux to windows. The code was about handling cleanly word wrapping for CJK languages (where the separation of words/letters is different) and the special cases were about 4 bytes characters, but windows wchar_t chocked on it.
Let the “#define PORT_WCHAR_T unsigned short int” abstraction commence!
programmer.laik: Meditation?:) Could you remind me the context?
Alink: So how much do these Asian languages have characters?
“The number of Chinese characters contained in the Kangxi dictionary is approximately 47,035, although a large number of these are rarely used variants accumulated throughout history.”
Yey for Wikipedia.
2.1 billion? Overkill, much?
So what happened to UTF-8?
http://mm.soldat.pl/inspirado/surviving-2008
“Meditation solves this problem, refreshes my mind and gives a thousand other benefits. Sleep isn’t so good for refreshing my mind.”
Pleeease, my masta;P
wchar_t is not portable:
“The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers.”