libintl-perl

Home -> libintl-perl -> 2005 -> August

 Problem with non-ASCII message id's 
Login Login Subscribe Subscribe  Date  08/16/05 14:01:13 GMT
 From  JörnReder
 Subject  Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
Guido Flohr wrote:

> The problem is that the information about the character set is only used
> for the possible output conversion of the translations.  It does not
> influence the lookup of the translations, i. e. the msgid strings are
> not first converted to the character set of the mo file.  There is no
> way to set the input character set in the API.  How could libintl (Perl
> or C) know which characer set is used?  It does a binary comparison.

Hmm, libintl resp. Perl should know which character set is used, at
least with perl > 5.8. Or lets say: it doesn't know exactly which
character set, but it knows whether it's utf-8 or not (from the internal
utf8 flag) which is sufficient for our problem.

If we could make libintl make know that the target charset is utf-8
(probably this just can be assumed, at least I never used it with
another target charset and we all know utf-8 is really reasonable here).
This assumed the __() function just needs to call
Encode::encode("utf8",$_[0]) to get the same binary representation as
used in the message catalog, or am I missing something?

Sounds so simple I just tried it myself ;) I added the following hack to
my test program:

  sub __ ($) {
    my $id = Encode::encode("utf8",$_[0]);
    Encode::_utf8_off($id);
    return Locale::TextDomain::__($id);
  }

Switching off the internal utf8 flag was neccessary to make the variable
"binary" again. Otherwise Perl's internal magic later recodes the
variable back to latin1, presumably somewhere on the file I/O layer.
Anyway, with this hack my tiny test program works.

Now the question is whether this would be worth to be added to
libintl... For projects not dealing with 8bit message id's it's a lot of
overhead for nothing. I think this could be solved by adding another
parameter to Locale::TextDomain->import() which controls exporting the
utf8 mangling variants of the functions on demand. This should be no
noticeable overhead at all. What do you think? I would make a
correspondent patch if you would accept it ;)

> BTW: The above po file is really in UTF-8 but (incorrectly) displayed in
> iso-8859-1.

Yep, my mail was indeed latin-1 encoded ;)

>> The first message gets translated, but the second not.
>
> Vice versa, I assume.  The first one is not translated, the next one is.

Ups, yes. ;)

> Thanks for your interest and for DVD::Rip. :-)

You're welcome ;) Thanks for the quick answer. And dvd::rip uses
Locale::TextDomain without any trouble, since the original language was
english there ;)

Regards,

Joern

--
LINUX - Linux Is Not gnU linuX
Attachments
 1  +-index.html message/rfc822  

ATTENTION: HTML attachments to this mail have been converted to plain text to prevent you from possibly malicious HTML files. Other attachments are included here without any checking. Choose your own poison! The maintainers of this site cannot be held responsible for any damage caused by these attachments.

 Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
 
 08/16/05 09:09:12 GMT  JörnReder
 08/16/05 12:58:20 GMT  +--Guido Flohr
 08/16/05 14:01:13 GMT    |--JörnReder
 08/16/05 15:22:33 GMT    |  +--Guido Flohr
 08/17/05 11:00:11 GMT    |    |--JörnReder
 08/18/05 07:30:54 GMT    |    +--JörnReder
 08/18/05 08:20:43 GMT    |      +--JörnReder
 08/18/05 09:05:35 GMT    |        +--Guido Flohr
 08/17/05 08:29:41 GMT    +--Bruno Haible

Powered by Imperia
Home | Top | Imprint