libintl-perl

Home -> libintl-perl -> 2005 -> August

 Problem with untranslated 8bit msgids 
Login Login Subscribe Subscribe  Date  08/21/05 12:49:28 GMT
 From  Guido Flohr
 Subject  Problem with untranslated 8bit msgids
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
Hi,

I have the following program:

	#include <locale.h>
	#include <errno.h>

	int
	main (int argc, char* argv[])
	{
        	setlocale (LC_ALL, "");
        	errno = ENOENT;
        	perror (gettext ("Datei öffnen"));
        	return 0;
	}

The program source, notably the string argument to gettext() is encoded in
utf-8 and I assume here that gettext() does not find a translation for the
string "Datei öffnen" (German for "open file").

As long as I run the program in a utf-8 locale on a utf-8 terminal there are
no problems.

In a iso-8859-1 locale, however, the output is messed up, since the msgid is
return unmodified, not converted to the correct character set for my locale:

	$ LANG=fr_FR; export LANG
	$ locale charmap
	ISO-8859-1
	$ ./l10ntest
	Datei öffnen: Aucun fichier ou répertoire de ce type

The German o with diaresis is encoded in utf-8, whereas the French e with
accent aigue is correctly converted to iso-8859-1.

I produced this example with GNU libc 2.4.3, but I think standalone
gettext-runtime will show the same behavior: Untranslated strings are passed
through unmodified in the original character set from the source code,
whereas translated strings are converted to the character set of the selected
locale.

A possible fix depends on our ability to determine the msgid character set.
Evaluating po headers (eventually fed with character set information from
xgettext --from-code) is not an option; the example shows, that there maybe
is no mo file at all that can be sourced.  In other cases, multiple mo files
with possibly conflicting header information could be sourced.

On the other hand, the above example is perfectly legal usage, and using
non-English non-ASCII msgids is no longer deprecated.  I can only see two
possible solutions:

1) Only msgids encoded in UTF-8 are supported.

2) A new function bind_textdomain_input_codeset is introduced, allowing the
programmer to specify the character set of the msgids in the program.  If the
function is not called, no default will be assumed, and therefore no output
conversion on msgids done.

Option 1 has backwards compatibility issues, I prefer option 2.

Regards,
Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/
Attachments
 1  +-[no description] multipart/signed  
 2    |-index.html message/rfc822  
 3    +-OpenPGP digital signature application/pgp-signature  

 Download OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDCHhNOo0HNPWNDz0RAsIBAJ0VE95RJ4zX2oShgpNtUt3gk7p1FQCePgbA
dG70iUktxBEpIRho4/N2UuY=
=HY70
-----END PGP SIGNATURE-----

ATTENTION: HTML attachments to this mail have been converted to plain text to prevent you from possibly malicious HTML files. Other attachments are included here without any checking. Choose your own poison! The maintainers of this site cannot be held responsible for any damage caused by these attachments.

 Problem with untranslated 8bit msgids
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
 
 08/21/05 12:49:28 GMT  Guido Flohr
 08/22/05 15:22:12 GMT  +--Bruno Haible

Powered by Imperia
Home | Top | Imprint