mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5)

mb_convert_encoding -- Convert character encoding

Description

string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )

mb_convert_encoding() converts character encoding of string str from from_encoding to to_encoding.

str : String to be converted.

from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used.

例子 1. mb_convert_encoding() example

<?php
/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

See also mb_detect_order().


add a note add a note User Contributed Notes
phpdoc at jeudi dot de
05-Sep-2006 09:46
I'd like to share some code to convert latin diacritics to their
traditional 7bit representation, like, for example,

- ,,,,... to a,c,e,i,...
-  to ss
- ,,... to ae,Ae,...
- ,... to e,...

(mb_convert "7bit" would simply delete any offending characters).

I might have missed on your country's typographic
conventions--correct me then.
<?php
/**
 * @args string $text line of encoded text
 *      string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
 *
 * @returns 7bit representation
 */
function to7bit($text,$from_enc) {
  
$text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);
  
$text = preg_replace(
       array(
'/&szlig;/','/&(..)lig;/',
            
'/&([aouAOU])uml;/','/&(.)[^;]*;/'),
       array(
'ss',"$1","$1".'e',"$1"),
      
$text);
   return
$text;

?>

Enjoy :-)
Johannes
mac.com@nemo
08-Jul-2006 10:38
For those wanting to convert from $set to MacRoman, use iconv():

<?php

$string
= iconv('UTF-8', 'macintosh', $string);

?>

('macintosh' is the IANA name for the MacRoman character set.)
Tom Class
11-Nov-2005 11:35
Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:

HTML-ENTITIES

Example:

$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");
Stephan van der Feest
09-Sep-2005 07:47
To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:

function htmltoflash($htmlstr)
{
  return str_replace("&lt;br /&gt;","\n",
   str_replace("<","&lt;",
     str_replace(">","&gt;",
       mb_convert_encoding(html_entity_decode($htmlstr),
       "UTF-8","ISO-8859-1"))));
}
Stephan van der Feest
09-Sep-2005 06:50
Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.

Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:

function utf8html($utf8str)
{
  return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
}
jamespilcher1 - hotmail
02-Feb-2004 11:55
be careful when converting from iso-8859-1 to utf-8.

even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed.

for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.

it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).

this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding() 

IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.

so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.
lanka at eurocom dot od dot ua
08-Feb-2003 12:03
Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)

<?php
 
// 0 - win
  // 1 - koi
 
function detect_encoding($str) {
  
$win = 0;
  
$koi = 0;

   for(
$i=0; $i<strlen($str); $i++) {
     if(
ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;
     if(
ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;
   }

   if(
$win < $koi ) {
     return
1;
   } else return
0;

  }

 
// recodes koi to win
 
function koi_to_win($string) {

  
$kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);
  
$wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);

  
$end = strlen($string);
  
$pos = 0;
   do {
    
$c = ord($string[$pos]);
     if (
$c>128) {
      
$string[$pos] = chr($kw[$c-128]);
     }

   } while (++
$pos < $end);

   return
$string;
  }

  function
recode($str) {

  
$enc = detect_encoding($str);
   if (
$enc==1) {
    
$str = koi_to_win($str);
   }

   return
$str;
  }
?>