[PHP-dev 47] Fw: htmlentities and charset awareness: --> str_convert_encoding

Yasuo Ohgaki php-dev@php.gr.jp
Sat, 25 Aug 2001 04:32:49 +0900


?????

php-dev@lists.php.net???????????????????????????

----- Original Message -----
From: "Wez Furlong" <wez@thebrainroom.com>
Newsgroups: php.dev
To: <php-dev@lists.php.net>
Sent: Friday, August 24, 2001 7:59 PM
Subject: htmlentities and charset awareness: --> str_convert_encoding


> Hi,
>
> I'm putting together a new PHP function called str_convert_encoding
which
> will convert a string from one encoding to another using the
features of the
> system.  It works like this:
>
> string str_convert_encoding(string srcstring, string fromenc, string
toenc)
>
> // No change; return source
> if (fromenc == toenc)
>     return srcstring;
>
> Promote us-ascii and iso-8859-1 to cp1252
> Try to convert using mbstring if present
> Try to convert using recode if present
> Try to convert using iconv if present
> return original string if cannot convert
>
> Promoting the charset is safe because iso-8859-1 is us-ascii with
extensions
> and cp1252 is iso-8859-1 with extensions.  What I have found is that
quite
> often ascii is used for charsets when it is really iso-8859-1 or
cp1252.
> The spirit of this function is to pass back something useful and not
be too
> strict/pedantic about it.
>
> For the implementation, I've added php_str_convert_encoding that
does the
> actual work and also returns a success code so that C code knows if
the
> encoding was changed.
>
> I plan to use this function with the htmlentities mods that I have
done
> recently so that when it comes across an unknown charset it can
convert it
> to utf-8, preserve any wide chars that might be present, encode the
entities
> and then convert back to the original encoding again.
>
> Does anyone have any comments or suggestions about this?
>
> I'm fairly certain that this function would be well received and
prevent
> loads of people having to do the equivalent in user land.
>
> --Wez.
>