mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5)

mb_decode_numericentity --  Decode HTML numeric string reference to character

Description

string mb_decode_numericentity ( string str, array convmap [, string encoding] )

Convert numeric string reference of string str in specified block to character. It returns converted string.

convmap is array to specifies code area to convert.

encoding is character encoding. If it is omitted, internal character encoding is used.

例子 1. convmap example

$convmap = array (
   int start_code1, int end_code1, int offset1, int mask1,
   int start_code2, int end_code2, int offset2, int mask2,
   ........
   int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN,
// then convert value to numeric string reference.

See also mb_encode_numericentity().


add a note add a note User Contributed Notes
donovan at conduit it
20-Apr-2006 12:05
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.

For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.
dirk at camindo de
31-Jan-2005 01:51
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the

function mb_encode_numericentity before:

  // convert $text from UTF-8 to ISO-8859-1
  $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
  $text = mb_encode_numericentity($text, $convmap, "UTF-8");
  $text = utf8_decode($text);

The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF
Andrew Simpson
11-Dec-2004 09:29
Many web browsers will tend upload high order characters as UTF-8 encoded entities.

Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:

<?php
  
//decode decimal HTML entities added by web browser
 
$body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
 
//decode hex HTML entities added by web browser
 
$body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );

//callback function for the regex
function utf8_entity_decode($entity){
 
$convmap = array(0x0, 0x10000, 0, 0xfffff);
 return
mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>
php at cNhOiSpPpAlMe dot org
31-Mar-2004 04:55
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.

<?php

// Supported characters:
//    (space)
//    !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)

function f_han2zen ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x20,0x20,0x3000-0x20,0xffff// Space
    
0x21,0x7e,0xff01-0x21,0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}
function
f_zen2han ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x3000,0x3000,-(0x3000-0x20),0xffff// Space
    
0xff01,0xff5e,-(0xff01-0x21),0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}

// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");

?>
dev at glossword info
19-Nov-2003 11:43
Just two great functions for daily use:

/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
   $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
   return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
   $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
   return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric('  ');