libintl-perl

Home -> libintl-perl -> 2005 -> August

 Problem with non-ASCII message id's 
Login Login Subscribe Subscribe  Date  08/17/05 11:00:11 GMT
 From  JörnReder
 Subject  Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
Guido Flohr wrote:

> The internal utf-8 flag is based on a guess that Perl takes and the Perl
> unicode list is full of examples where this guess is wrong.

Hhm, I don't read the unicode list but dealt with a lot of utf-8 Perl
projects and don't think the utf-8 flag is just guessing. It's setting
is strictly defined. If you "use utf8" in your sources, all literal
strings get the flag set. If you read from an filehandle with the :utf8
layer assigned, the flag is set. And so on. I don't know any situation
where Perl just guesses about the utf8-ness of a variable.

>> If we could make libintl make know that the target charset is utf-8
>> (probably this just can be assumed, at least I never used it with
>> another target charset and we all know utf-8 is really reasonable here).
>> This assumed the __() function just needs to call
>> Encode::encode("utf8",$_[0]) to get the same binary representation as
>> used in the message catalog, or am I missing something?
>
> The conversion can fail.  And the assumption whether the input is in
> utf-8 will often be wrong.  The worst about it is that these failures
> will often be caused by user locale settings.

The Encode::encode("utf8",$_[0]) does not assume anything on the input
(which is "a string of Perl's internal form", quoting the docs) but
converts anything to utf8. If the input has the utf-8 flag set this is a
no-op. Otherwise it will be converted to utf-8, and it's very unlikely
that this conversion fails, since utf-8 is a superset of almost anything
and in particular of iso-8859-1 I'm dealing with.

> You see? You had to switch on and off obscure flags.  And you pay that
> with a compatibility nightmare (do you know the locale settings of your
> users?) plus a performance penalty.

Hmm, you don't like the utf-8 flag, right? ;) The advantage of encode()
is that it is aware of the input character set so we don't need to care
about this. The disadvantage for our situation is it sets the utf-8 flag
and we need to know some internals to do the right thing with it. Ok,
struggling with the utf-8 flag is seldom elegant, but what's happening
here is quite clear.

> BTW, you could write the above in a more compatible fashion using
> Locale::Recode and  Locale::Messages::turn_utf_8_off().

Good to know, otherwise I had to insist on Perl > 5.8.

> Honestly, I would not encourage anybody to use non-ascii message ids
> with Perl.

I know, but unfortunately that's no option in the project I'm dealing
with here...

> If you still want to do, the easiest fix IMHO is to not mix character
> sets.  Why introduce a solution for a runtime problem that can be easily
> solved when writing or distributing the software?   Change both po files
> and sources to iso-8859-1 and everything will work.

What about the chinese translator who can't deal with iso-8859-1? The
utf-8 format for the .po files is the right thing here in my opinion.

> Change it to utf-8 and it will work as well.

The .po files are in utf-8 already (converted by xgettext), and this is
good.

> Your solution will waste cpu cycles for users everytime they run the
> software.  IMHO it's always favorable to only once waste the cpu cycles
> of the development machine.

Yep, that's a point, indeed a big disadvantage. I'll do some benchmarks
to get a better feeling of how big the performance loss really is. If
it's too much, I indeed need to think about converting the sources to
utf-8, minimum for the build-process, as you mentioned, although that
probably introduces some detail problems I fear... ;). Or to urge the
project team to write english messages ;)

> If somebody really, really accepts all that, than writing the tiny
> wrapper around the libintl functions should not be unfeasible.

I think best thing is a derived class overriding the import() method to
export the utf-8 converting functions to the caller's namespace. This
wouldn't affect normal operation of Locale::TextDomain at all. The
manpage could by a copy of this thread ;)

> Oops, it's dvd::rip, not DVD::Rip. ;-)

Yep, I silently recognized your misspelling and a sattelite with a 17
tons concrete block was already on its way for you... ;)

Regards,

Joern

--
sub i($){print$_[0]}*j=*ENV;sub w($){sleep$_[0]}sub _($){i"$p$c:\$ ",w+1
,$_=$_[0],tr;i-za-h,;a-hi-z ;,i$_,w+01,i"\n"}$|=1;$f='HO';($c=$j{PWD})=~
s+$j{$f."ME"}+~+;$p.="$j{USER}\@".`hostname`;chop$p;_"kl",$c='~',_"zu,".
"-zn,*",_"#,epg,lw,gwc,mfmkcbm,cvsvwev,uiqt,kwvbmvb?",i"$p$c:\$ ";w+1<<7
Attachments
 1  +-index.html message/rfc822  

ATTENTION: HTML attachments to this mail have been converted to plain text to prevent you from possibly malicious HTML files. Other attachments are included here without any checking. Choose your own poison! The maintainers of this site cannot be held responsible for any damage caused by these attachments.

 Problem with non-ASCII message id's
 Previous Thread
 Next Thread
 Start of Thread
 Reference
 Previous Reply
 This Message
 Reply
 Next Reply
 
 08/16/05 09:09:12 GMT  JörnReder
 08/16/05 12:58:20 GMT  +--Guido Flohr
 08/16/05 14:01:13 GMT    |--JörnReder
 08/16/05 15:22:33 GMT    |  +--Guido Flohr
 08/17/05 11:00:11 GMT    |    |--JörnReder
 08/18/05 07:30:54 GMT    |    +--JörnReder
 08/18/05 08:20:43 GMT    |      +--JörnReder
 08/18/05 09:05:35 GMT    |        +--Guido Flohr
 08/17/05 08:29:41 GMT    +--Bruno Haible

Powered by Imperia
Home | Top | Imprint