[PHP-dev 1383] Fw: [PHP-DEV] mbstring: missing support for hex numeric entities &xHHHH;
Seiji Masugata
s.masugata @ digicom.dnp.co.jp
2007年 8月 9日 (木) 17:47:48 JST
こんにちわ、桝形です。
ついでに。
結構前の話ですが、本家のML宛に以下のようなメールが届いていました。
メールを出さないと。。。と思いつつ、放置してました。スミマセン。
ちなみに、このメールに対するMLでの返信は特にありませんでした。
Forwarded by Seiji Masugata
--------------------- Original Message Start -----------------------
From: Umberto Salsi <salsi @ icosaedro.it>
To: <internals @ lists.php.net>
Date: Wed, 23 May 2007 17:40:57 CEST
Subject: [PHP-DEV] mbstring: missing support for hex numeric entities &xHHHH;
----
mbstring does not support numeric entities in HTML code. For example:
echo urlencode( mb_convert_encoding("Е", "UTF-8", "HTML-ENTITIES") );
displays %F2%AF%B8%9F rather than the expected %D0%95.
This bug was detected by Nick Wedd <nick @ maproom.co.uk> and reported in the
newsgroup comp.lang.php, Message-ID: <EU9zOoNGJAVGFAaa @ maproom.demon.co.uk>.
I'd found the bug in the file ext/mbstring/libmbfl/filters/mbfilter_htmlent.c
and added these features:
- decode hex entities &xHHHH;
- detect invalid digits
- detect digits missing at all
- detect values out of the range 0-0xffff
Invalid values are returned verbatim.
Apparently the right place for this patch should be
http://cvs.sourceforge.jp/cgi-bin/viewcvs.cgi/php-i18n/
but currently the project isn't no more hosted there.
The patch for ext/mbstring/libmbfl/filters/mbfilter_htmlent.c follows:
173a174,217
> static int mbfl_decode_numeric_entity(char *s, int s_len)
> /*
> s = numeric entity "ddd" or "xhhhh"
> return: numeric value or -1 if not inside [0,0xffff] or invalid digits
> */
> {
> int ent, pos, c, d;
>
> ent = 0;
>
> if (*s == 'x' || *s == 'X') {
> /* hexadecimal base */
> if ( s_len < 2 )
> return -1; /* no digits found */
> for (pos=1; pos<s_len; pos++) {
> c = s[pos];
> if (isdigit(c))
> d = c - '0';
> else if (isxdigit(c))
> d = tolower(c) - 'a' + 10;
> else
> return -1; /* invalid hex digit */
> ent = (ent << 4) + d;
> if (ent > 0xffff)
> return -1; /* too big */
> }
>
> } else {
> /* decimal base */
> if ( s_len < 1 )
> return -1; /* no digits found */
> for (pos=0; pos<s_len; pos++) {
> c = s[pos];
> if (! isdigit(c) )
> return -1; /* invalid dec char */
> ent = ent*10 + (c - '0');
> if (ent > 0xffff)
> return -1; /* too big */
> }
> }
>
> return ent;
> }
>
192,193c236,246
< for (pos=2; pos<filter->status; pos++) {
< ent = ent*10 + (buffer[pos] - '0');
---
> ent = mbfl_decode_numeric_entity(&buffer[2], filter->status - 2);
> if( ent >= 0 ){
> CK((*filter->output_function)(ent, filter->data));
> filter->status = 0;
> /*php_error_docref("ref.mbstring" TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/
> } else {
> /* failure */
> buffer[filter->status++] = ';';
> buffer[filter->status] = 0;
> /* php_error_docref("ref.mbstring" TSRMLS_CC, E_WARNING, "mbstring cannot decode '%s'", buffer); */
> mbfl_filt_conv_html_dec_flush(filter);
195,197d247
< CK((*filter->output_function)(ent, filter->data));
< filter->status = 0;
< /*php_error_docref("ref.mbstring" TSRMLS_CC, E_NOTICE, "mbstring decoded '%s'=%d", buffer, ent);*/
Best regards,
___
/_|_\ Umberto Salsi
\/_\/ www.icosaedro.it
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--------------------- Original Message End -------------------------
--
Seiji Masugata <s.masugata @ digicom.dnp.co.jp>
PHP-dev メーリングリストの案内