html_entity_decode

(PHP 4 >= 4.3.0, PHP 5)

html_entity_decode --  Convert all HTML entities to their applicable characters

Description

string html_entity_decode ( string string [, int quote_style [, string charset]] )

html_entity_decode() is the opposite of htmlentities() in that it converts all HTML entities to their applicable characters from string.

The optional second quote_style parameter lets you define what will be done with 'single' and "double" quotes. It takes on one of three constants with the default being ENT_COMPAT:

表格 1. Available quote_style constants

Constant NameDescription
ENT_COMPATWill convert double-quotes and leave single-quotes alone.
ENT_QUOTESWill convert both double and single quotes.
ENT_NOQUOTESWill leave both double and single quotes unconverted.

The ISO-8859-1 character set is used as default for the optional third charset. This defines the character set used in conversion.

PHP 4.3.0 及其后续版本支持如下字符集。

表格 2. 已支持字符集

字符集别名描述
ISO-8859-1ISO8859-1 西欧,Latin-1
ISO-8859-15ISO8859-15 西欧,Latin-9。增加了 Latin-1(ISO-8859-1)中缺少的欧元符号、法国及芬兰字母。
UTF-8  ASCII 兼容多字节 8-bit Unicode。
cp866ibm866, 866 DOS-特有的 Cyrillic 字母字符集。PHP 4.3.2 开始支持该字符集。
cp1251Windows-1251, win-1251, 1251 Windows-特有的 Cyrillic 字母字符集。PHP 4.3.2 开始支持该字符集。
cp1252Windows-1252, 1252 Windows 对于西欧特有的字符集。
KOI8-Rkoi8-ru, koi8r 俄文。PHP 4.3.2 开始支持该字符集。
BIG5950 繁体中文,主要用于中国台湾。
GB2312936 简体中文,国际标准字符集。
BIG5-HKSCS  繁体中文,Big5 的延伸,主要用于香港。
Shift_JISSJIS, 932 日文。
EUC-JPEUCJP 日文。

注: ISO-8859-1 将代替任何其它无法识别的字符集。

注: This function doesn't support multi-byte character sets in PHP < 5.

例子 1. Decoding HTML entities

<?php
$orig
= "I'll \"walk\" the <b>dog</b> now";

$a = htmlentities($orig);

$b = html_entity_decode($a);

echo
$a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b; // I'll "walk" the <b>dog</b> now


// For users prior to PHP 4.3.0 you may do this:
function unhtmlentities($string)
{
    
// replace numeric entities
    
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);
    
// replace literal entities
    
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
    
$trans_tbl = array_flip($trans_tbl);
    return
strtr($string, $trans_tbl);
}

$c = unhtmlentities($a);

echo
$c; // I'll "walk" the <b>dog</b> now

?>

注: You might wonder why trim(html_entity_decode('&nbsp;')); doesn't reduce the string to an empty string, that's because the '&nbsp;' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.

See also htmlentities(), htmlspecialchars(), get_html_translation_table(), and urldecode().


add a note add a note User Contributed Notes
jojo
04-Nov-2006 12:27
The decipherment does the character encoded by the escape function of JavaScript.
When the multi byte is used on the page, it is effective.

javascript escape('aaaa') ..... 'aa%u3042%u3042aa'
php  jsEscape_decode('aa%u3042%u3042aa')..'aaaa'

<?
function jsEscape_decode($jsEscaped,$outCharCode='SJIS'){
  
$arrMojis = explode("%u",$jsEscaped);
   for (
$i = 1;$i < count($arrMojis);$i++){
      
$c = substr($arrMojis[$i],0,4);
      
$cc = mb_convert_encoding(pack('H*',$c),$outCharCode,'UTF-16');
      
$arrMojis[$i] = substr_replace($arrMojis[$i],$cc,0,4);
   }
   return
implode('',$arrMojis);
}
?>
romekt at CUTTHISgmail dot com
02-Sep-2006 05:15
here's a simple workaround for the UTF-8 support problem

$var=iconv("UTF-8","ISO-8859-1",$var);
$var=html_entity_decode($var, ENT_QUOTES, 'ISO-8859-1');
$var=iconv("ISO-8859-1","UTF-8",$var);
derernst at gmx dot ch
01-Aug-2006 06:09
Combining the suggestions by buraks78 at gmail dot com, gaui at gaui dot is, daniel at brightbyte dot de, and the version in PEAR_PHP_Compat, I come to the following, which should work in an UTF-8 environment, with PHP < or > 4.3:

<?php
function decode_entities($text, $quote_style = ENT_COMPAT) {
   if (
function_exists('html_entity_decode')) {
      
$text = html_entity_decode($text, $quote_style, 'ISO-8859-1'); // NOTE: UTF-8 does not work!
  
}
   else {
      
$trans_tbl = get_html_translation_table(HTML_ENTITIES, $quote_style);
      
$trans_tbl = array_flip($trans_tbl);
      
$text = strtr($text, $trans_tbl);
   }
  
$text = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $text);
  
$text = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $text);
   return
$text;
}
?>

Note that I omitted the line
$trans_table['&#39;'] = "'";
as it would override the quote_style setting and thus lead to unexpected results for quote_styles ENT_NOQUOTES and ENT_COMPAT.
grvg (at) free (dot) fr
30-Jul-2006 12:44
Here is the ultimate functions to convert HTML entities to UTF-8:
The main function ishtmlentities2utf8
Others are helper functions

function chr_utf8($code)
   {
       if ($code < 0) return false;
       elseif ($code < 128) return chr($code);
       elseif ($code < 160) // Remove Windows Illegals Cars
       {
           if ($code==128) $code=8364;
           elseif ($code==129) $code=160; // not affected
           elseif ($code==130) $code=8218;
           elseif ($code==131) $code=402;
           elseif ($code==132) $code=8222;
           elseif ($code==133) $code=8230;
           elseif ($code==134) $code=8224;
           elseif ($code==135) $code=8225;
           elseif ($code==136) $code=710;
           elseif ($code==137) $code=8240;
           elseif ($code==138) $code=352;
           elseif ($code==139) $code=8249;
           elseif ($code==140) $code=338;
           elseif ($code==141) $code=160; // not affected
           elseif ($code==142) $code=381;
           elseif ($code==143) $code=160; // not affected
           elseif ($code==144) $code=160; // not affected
           elseif ($code==145) $code=8216;
           elseif ($code==146) $code=8217;
           elseif ($code==147) $code=8220;
           elseif ($code==148) $code=8221;
           elseif ($code==149) $code=8226;
           elseif ($code==150) $code=8211;
           elseif ($code==151) $code=8212;
           elseif ($code==152) $code=732;
           elseif ($code==153) $code=8482;
           elseif ($code==154) $code=353;
           elseif ($code==155) $code=8250;
           elseif ($code==156) $code=339;
           elseif ($code==157) $code=160; // not affected
           elseif ($code==158) $code=382;
           elseif ($code==159) $code=376;
       }
       if ($code < 2048) return chr(192 | ($code >> 6)) . chr(128 | ($code & 63));
       elseif ($code < 65536) return chr(224 | ($code >> 12)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63));
       else return chr(240 | ($code >> 18)) . chr(128 | (($code >> 12) & 63)) . chr(128 | (($code >> 6) & 63)) . chr(128 | ($code & 63));
   }

   // Callback for preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $str);
   function html_entity_replace($matches)
   {
       if ($matches[2])
       {
           return chr_utf8(hexdec($matches[3]));
       } elseif ($matches[1])
       {
           return chr_utf8($matches[3]);
       }
       switch ($matches[3])
       {
           case "nbsp": return chr_utf8(160);
           case "iexcl": return chr_utf8(161);
           case "cent": return chr_utf8(162);
           case "pound": return chr_utf8(163);
           case "curren": return chr_utf8(164);
           case "yen": return chr_utf8(165);
           //... etc with all named HTML entities
       }
       return false;
   }
  
   function htmlentities2utf8 ($string) // because of the html_entity_decode() bug with UTF-8
   {
       $string = preg_replace_callback('~&(#(x?))?([^;]+);~', 'html_entity_replace', $string);
       return $string;
   }
nycolhas at hotmail dot com
06-Apr-2006 02:24
This function might be useful for people who want to capitalize a string using html entities.

<?php
function htmlstrtoupper(&$string) {
   return
htmlentities(strtoupper(html_entity_decode(&$string)));
}
?>
buraks78 at gmail dot com
08-Feb-2006 07:19
The "unhtmlentities" function defined above fails to decode single quotes properly. The issue can be solved by putting double quotes around replacing chr(\\1) with chr("\\1")...

function unhtmlentities($string)
{
   // replace numeric entities
   $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
   $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
   // replace literal entities
   $trans_tbl = get_html_translation_table(HTML_ENTITIES);
   $trans_tbl = array_flip($trans_tbl);
   return strtr($string, $trans_tbl);
}
hurricane at cyberworldz dot org
23-Dec-2005 12:33
I shortened the function repace_num_entity a bit to make more understandable and clean. Maybe now someone sees the problem it possibly has... (as mentioned below)

<?php
function replace_num_entity($ord) {
  
$ord = $ord[1];
   if (
preg_match('/^x([0-9a-f]+)$/i', $ord, $match)) $ord = hexdec($match[1]);
       else
$ord = intval($ord);
  
$no_bytes = 0;
  
$byte = array();
   if (
$ord < 128) return chr($ord);
   if (
$ord < 2048) $no_bytes = 2;
       else if (
$ord < 65536) $no_bytes = 3;
       else if (
$ord < 1114112) $no_bytes = 4;
       else return;
   switch(
$no_bytes) {
       case
2: $prefix = array(31, 192); break;
       case
3: $prefix = array(15, 224); break;
       case
4: $prefix = array(7, 240);
   }
   for (
$i=0; $i < $no_bytes; ++$i)
      
$byte[$no_bytes-$i-1] = (($ord & (63 * pow(2,6*$i))) / pow(2,6*$i)) & 63 | 128;
  
$byte[0] = ($byte[0] & $prefix[0]) | $prefix[1];
  
$ret = '';
   for (
$i=0; $i < $no_bytes; ++$i) $ret .= chr($byte[$i]);
   return
$ret;
}
?>
loufoque
09-Oct-2005 04:15
If you want to decode NCRs to utf-8 use this function instead of chr().

function utf8_chr($code)
{
   if($code<128) return chr($code);
   else if($code<2048) return chr(($code>>6)+192).chr(($code&63)+128);
   else if($code<65536) return chr(($code>>12)+224).chr((($code>>6)&63)+128).chr(($code&63)+128);
   else if($code<2097152) return chr($code>>18+240).chr((($code>>12)&63)+128)
                                 .chr(($code>>6)&63+128).chr($code&63+128));
}
emilianomartinezluque at yahoo dot com
26-Sep-2005 08:22
I've been using the great replace_num_entity function posted below. But there seems to be some problems with the 128 to 160 characters range. Ie, try:

<?php header("Content-type: text/html; charset=utf-8"); ?>
<html><body>
<?php
for($x=128; $x<161; $x++) {
     echo(
'&#' . $x . '; -- ' . preg_replace_callback('/&#([0-9a-fx]+);/mi', 'replace_num_entity', '&#' . $x . ';') . '</br>');
}
?>
</body></html>

I really dont know the reason for this (since according to UTF-8 specs the function should have worked) but I did a modified version of the function to address this. Hope it helps.

function replace_num_entity($ord)
   {
       $ord = $ord[1];
       if (preg_match('/^x([0-9a-f]+)$/i', $ord, $match))
       {
           $ord = hexdec($match[1]);
       }
       else
       {
           $ord = intval($ord);
       }
    
       $no_bytes = 0;
       $byte = array();

       if($ord == 128) {
           return chr(226).chr(130).chr(172);
       } elseif($ord == 129) {
           return chr(239).chr(191).chr(189);
       } elseif($ord == 130) {
           return chr(226).chr(128).chr(154);
       } elseif($ord == 131) {
           return chr(198).chr(146);
       } elseif($ord == 132) {
           return chr(226).chr(128).chr(158);
       } elseif($ord == 133) {
           return chr(226).chr(128).chr(166);
       } elseif($ord == 134) {
           return chr(226).chr(128).chr(160);
       } elseif($ord == 135) {
           return chr(226).chr(128).chr(161);
       } elseif($ord == 136) {
           return chr(203).chr(134);
       } elseif($ord == 137) {
           return chr(226).chr(128).chr(176);
       } elseif($ord == 138) {
           return chr(197).chr(160);
       } elseif($ord == 139) {
           return chr(226).chr(128).chr(185);
       } elseif($ord == 140) {
           return chr(197).chr(146);
       } elseif($ord == 141) {
           return chr(239).chr(191).chr(189);
       } elseif($ord == 142) {
           return chr(197).chr(189);
       } elseif($ord == 143) {
           return chr(239).chr(191).chr(189);
       } elseif($ord == 144) {
           return chr(239).chr(191).chr(189);
       } elseif($ord == 145) {
           return chr(226).chr(128).chr(152);
       } elseif($ord == 146) {
           return chr(226).chr(128).chr(153);
       } elseif($ord == 147) {
           return chr(226).chr(128).chr(156);
       } elseif($ord == 148) {
           return chr(226).chr(128).chr(157);
       } elseif($ord == 149) {
           return chr(226).chr(128).chr(162);
       } elseif($ord == 150) {
           return chr(226).chr(128).chr(147);
       } elseif($ord == 151) {
           return chr(226).chr(128).chr(148);
       } elseif($ord == 152) {
           return chr(203).chr(156);
       } elseif($ord == 153) {
           return chr(226).chr(132).chr(162);
       } elseif($ord == 154) {
           return chr(197).chr(161);
       } elseif($ord == 155) {
           return chr(226).chr(128).chr(186);
       } elseif($ord == 156) {
           return chr(197).chr(147);
       } elseif($ord == 157) {
           return chr(239).chr(191).chr(189);
       } elseif($ord == 158) {
           return chr(197).chr(190);
       } elseif($ord == 159) {
           return chr(197).chr(184);
       } elseif($ord == 160) {
           return chr(194).chr(160);
       }

       if ($ord < 128)
       {
           return chr($ord);
       }
       elseif ($ord < 2048)
       {
           $no_bytes = 2;
       }
       elseif ($ord < 65536)
       {
           $no_bytes = 3;
       }
       elseif ($ord < 1114112)
       {
           $no_bytes = 4;
       }
       else
       {
           return;
       }

       switch($no_bytes)
       {
           case 2:
           {
               $prefix = array(31, 192);
               break;
           }
           case 3:
           {
               $prefix = array(15, 224);
               break;
           }
           case 4:
           {
               $prefix = array(7, 240);
           }
       }

       for ($i = 0; $i < $no_bytes; $i++)
       {
           $byte[$no_bytes - $i - 1] = (($ord & (63 * pow(2, 6 * $i))) / pow(2, 6 * $i)) & 63 | 128;
       }

       $byte[0] = ($byte[0] & $prefix[0]) | $prefix[1];

       $ret = '';
       for ($i = 0; $i < $no_bytes; $i++)
       {
           $ret .= chr($byte[$i]);
       }

       return $ret;
   }
florianborn (at) yahoo (dot) de
20-Jul-2005 06:43
Note that

<?php

 
echo urlencode(html_entity_decode("&nbsp;"));

?>

will output "%A0" instead of "+".
gaui at gaui dot is
05-Jul-2005 08:15
if( !function_exists( 'html_entity_decode' ) )
{
   function html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) {
       $trans_table = array_flip(get_html_translation_table( HTML_SPECIALCHARS, $quote_style ));
       $trans_table['&#39;'] = "'";
       return ( strtr( $given_html, $trans_table ) );
       }
}
marius (at) hot (dot) ee
08-Apr-2005 09:40
To convert html entities into unicode characters, use the following:

       $trans_tbl = get_html_translation_table(HTML_ENTITIES);
       foreach($trans_tbl as $k => $v)
       {
           $ttr[$v] = utf8_encode($k);
       }
  
       $text = strtr($text, $ttr);
php dot net at c dash ovidiu dot tk
18-Mar-2005 04:37
Quick & dirty code that translates numeric entities to UTF-8.

<?php

  
function replace_num_entity($ord)
   {
      
$ord = $ord[1];
       if (
preg_match('/^x([0-9a-f]+)$/i', $ord, $match))
       {
          
$ord = hexdec($match[1]);
       }
       else
       {
          
$ord = intval($ord);
       }
      
      
$no_bytes = 0;
      
$byte = array();

       if (
$ord < 128)
       {
           return
chr($ord);
       }
       elseif (
$ord < 2048)
       {
          
$no_bytes = 2;
       }
       elseif (
$ord < 65536)
       {
          
$no_bytes = 3;
       }
       elseif (
$ord < 1114112)
       {
          
$no_bytes = 4;
       }
       else
       {
           return;
       }

       switch(
$no_bytes)
       {
           case
2:
           {
              
$prefix = array(31, 192);
               break;
           }
           case
3:
           {
              
$prefix = array(15, 224);
               break;
           }
           case
4:
           {
              
$prefix = array(7, 240);
           }
       }

       for (
$i = 0; $i < $no_bytes; $i++)
       {
          
$byte[$no_bytes - $i - 1] = (($ord & (63 * pow(2, 6 * $i))) / pow(2, 6 * $i)) & 63 | 128;
       }

      
$byte[0] = ($byte[0] & $prefix[0]) | $prefix[1];

      
$ret = '';
       for (
$i = 0; $i < $no_bytes; $i++)
       {
          
$ret .= chr($byte[$i]);
       }

       return
$ret;
   }

  
$test = 'This is a &#269;&#x5d0; test&#39;';

   echo
$test . "<br />\n";
   echo
preg_replace_callback('/&#([0-9a-fx]+);/mi', 'replace_num_entity', $test);

?>
Silvan
29-Jan-2005 11:33
Passing NULL or FALSE as a string will generate a '500 Internal Server Error' (or break the script when inside a function).

So always test your string first before passing it to html_entity_decode().
daniel at brightbyte dot de
14-Nov-2004 10:12
This function seems to have to have two limitations (at least in PHP 4.3.8):

a) it does not work with multibyte character codings, such as UTF-8
b) it does not decode numeric entity references

a) can be solved by using iconv to convert to ISO-8859-1, then decoding the entities, than convert to UTF-8 again. But that's quite ugly and detroys all characters not present in Latin-1.

b) can be solved rather nicely using the following code:

<?php
function decode_entities($text) {
  
$text= html_entity_decode($text,ENT_QUOTES,"ISO-8859-1"); #NOTE: UTF-8 does not work!
  
$text= preg_replace('/&#(\d+);/me',"chr(\\1)",$text); #decimal notation
  
$text= preg_replace('/&#x([a-f0-9]+);/mei',"chr(0x\\1)",$text);  #hex notation
  
return $text;
}
?>

HTH
aidan at php dot net
14-Sep-2004 03:57
This functionality is now implemented in the PEAR package PHP_Compat.

More information about using this function without upgrading your version of PHP can be found on the below link:

http://pear.php.net/package/PHP_Compat