libintl-perl

Home -> libintl-perl -> 2005 -> August

 Problem with non-ASCII message id's 
Login Login Subscribe Subscribe  Date  08/16/05 12:58:20 GMT
 From  Guido Flohr
 Subject  Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
Hi Jörn,

The short answer to your problem: Change everything from iso-8859-1 to utf-8
and it will work.  The long answer follows, and it is actually not specific
to libintl-perl but to gettext in general..

Jörn Reder wrote:
> Hi,
> I have a project here with German messages in the program code I like to
> translate using Locale::TextLocale 1.14 (currently without XS
> optimization). Perl version is 5.8.5. The program file encoding is
> ISO-8859-1, so I set xgettext --from-code=ISO-8859-1 in the Makefile.

The C version which uses GNU libintl shows the same behavior.

> The generated .pot and .po files have the messages converted to UTF-8, no
> problem so far. The "Content-Type" header is set to UTF-8 accordingly,
> everyhing looks consistent.

The problem is that the information about the character set is only used for
the possible output conversion of the translations.  It does not influence
the lookup of the translations, i. e. the msgid strings are not first
converted to the character set of the mo file.  There is no way to set the
input character set in the API.  How could libintl (Perl or C) know which
characer set is used?  It does a binary comparison.

> This is my en.po file:
>
> msgid ""
> msgstr ""
> "Project-Id-Version: l10ntest 1.0\n"
> "Report-Msgid-Bugs-To: Joern Reder <joern AT zyn.de>\n"
> "POT-Creation-Date: 2005-08-16 10:04+0200\n"
> "PO-Revision-Date: 2005-08-16 10:02+CET\n"
> "Last-Translator: Joern Reder <>\n"
> "Language-Team: Joern Reder <>\n"
> "MIME-Version: 1.0\n"
> "Content-Type: text/plain; charset=UTF-8\n"
> "Content-Transfer-Encoding: 8bit\n"
>
> #: ../test.pl:8
> msgid "Datei öffnen"
> msgstr "Open file"
>
> #: ../test.pl:9
> msgid "Datei oeffnen"
> msgstr "Open file"
> But all message id's with ISO-Latin 8bit characters are not translated.
> Obviously something with the message catalog lookup doesn't work here as
> expected. My test program looks like this (really simple and
> straightforward ;)

It looks up the string in encoded iso-8859-1, not in utf-8.

BTW: The above po file is really in UTF-8 but (incorrectly) displayed in
iso-8859-1.

> use strict;
> use lib 'lib';
> use Locale::TextDomain ("test");
> print __"Datei öffnen","\n";
> print __"Datei oeffnen","\n";
> The first message gets translated, but the second not.

Vice versa, I assume.  The first one is not translated, the next one is.

> What am I doing wrong?

You mix character sets, and you cannot/should not do that.  The C version
behaves the same, and therefore I cannot change that without breaking
compatibility.

Personally I recommend against any other character set than us-ascii in the
sources, and therefore against any other language than English for the msgid.

In Perl, the Unicode support starting with Perl 5.8 is a complete mess, and I
would _never_ use any non ascii characters in the sources.  The way the Perl
interpreter processes your source code is for sane human beings
impredictable, because it will sometimes convert even source code from the
assumed input character set to the guessed ouput character set. The outcome
of this chain of guesses is so unsecure, that I would not chance any risks,
and stick with us-ascii.

Thanks for your interest and for DVD::Rip. :-)

Guido
--
Imperia AG, Development
Leyboldstr. 10 - D-50354 Hürth - http://www.imperia.net/
Attachments
 1  +-[no description] multipart/signed  
 2    |-index.html message/rfc822  
 3    +-OpenPGP digital signature application/pgp-signature  

 Download OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFDAeLiOo0HNPWNDz0RAiJEAKCxqF/AoYztXbY8MjC8Kn27DnxhmwCfd/9/
mak+A2ZQZ2hRRlrmv9ORvgk=
=vN+K
-----END PGP SIGNATURE-----

ATTENTION: HTML attachments to this mail have been converted to plain text to prevent you from possibly malicious HTML files. Other attachments are included here without any checking. Choose your own poison! The maintainers of this site cannot be held responsible for any damage caused by these attachments.

 Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
 
 08/16/05 09:09:12 GMT  JörnReder
 08/16/05 12:58:20 GMT  +--Guido Flohr
 08/16/05 14:01:13 GMT    |--JörnReder
 08/16/05 15:22:33 GMT    |  +--Guido Flohr
 08/17/05 11:00:11 GMT    |    |--JörnReder
 08/18/05 07:30:54 GMT    |    +--JörnReder
 08/18/05 08:20:43 GMT    |      +--JörnReder
 08/18/05 09:05:35 GMT    |        +--Guido Flohr
 08/17/05 08:29:41 GMT    +--Bruno Haible

Powered by Imperia
Home | Top | Imprint