iconv

(PHP 4 >= 4.0.5, PHP 5)

iconv -- Convert string to requested character encoding

Description

string iconv ( string in_charset, string out_charset, string str )

Performs a character set conversion on the string str from in_charset to out_charset. Returns the converted string or FALSE on failure.

If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.

例子 1. iconv() example:

<?php
echo iconv("ISO-8859-1", "UTF-8", "This is a test.");
?>


add a note add a note User Contributed Notes
sire at acc dot umu dot se
14-Dec-2005 05:17
If you get this error message: "Notice: iconv(): Detected an illegal character in input string in file.php on line x", and your text or database is likely to contain text copied from Microsoft Word documents, it's very likely that the error is because of the evil 0x96 "long dash" character. MS Word as default converts all double hyphens into this illegal character. The solution is either to convert 0x96 (dash) into the regular 0x2d (hyphen/minus), or to append the //TRANSLIT or //IGNORE parameters (se above).
nilcolor at gmail dot coom
24-Nov-2005 07:29
Didn't know its a feature or not but its works for me (PHP 5.0.4)

iconv('', 'UTF-8', $str)

test it to convert from windows-1251 (stored in DB) to UTF-8 (which i use for web pages).
BTW i convert each array i fetch from DB with array_walk_recursive...
anyean at gmail dot com
30-May-2005 06:23
<?php
//script from http://zizi.kxup.com/
//javascript unesape
function unescape($str) {
 
$str = rawurldecode($str);
 
preg_match_all("/(?:%u.{4})|&#x.{4};|&#\d+;|.+/U",$str,$r);
 
$ar = $r[0];
print_r($ar);
  foreach(
$ar as $k=>$v) {
   if(
substr($v,0,2) == "%u")
    
$ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,-4)));
   elseif(
substr($v,0,3) == "&#x")
    
$ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,3,-1)));
   elseif(
substr($v,0,2) == "&#") {
echo
substr($v,2,-1)."<br>";
    
$ar[$k] = iconv("UCS-2","UTF-8",pack("n",substr($v,2,-1)));
   }
  }
  return
join("",$ar);
}
?>
zhawari at hotmail dot com
01-Feb-2005 07:27
Here is how to convert UTF-8 numbers to UCS-2 numbers in hex:

<?php
 
function utf8toucs2($str)
{
       for (
$i=0;$i<strlen($str);$i+=2)
       {
              
$substring1 = $str[$i].$str[$i+1]; 
              
$substring2 = $str[$i+2].$str[$i+3];
              
               if (
hexdec($substring1) < 127)
                      
$results = "00".$str[$i].$str[$i+1];
               else
               {
                      
$results = dechex((hexdec($substring1)-192)*64 + (hexdec($substring2)-128));
                       if (
$results < 1000) $results = "0".$results;
                      
$i+=2;
               }
              
$ucs2 .= $results;
       }
       return
$ucs2;
}
 
echo
strtoupper(utf8toucs2("D985D8B1D8AD"))."\n";
echo
strtoupper(utf8toucs2("456725"))."\n";
 
?>

Input:
D985D8B1D8AD
Output:
06450631062D

Input:
456725
Output:
004500670025
PHANTOm <phantom at nix dot co dot il>
28-Jan-2005 04:49
convert windows-1255 to utf-8 with the following code
<?php
$heb
= 'put hebrew text here';
$utf = preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$heb);
?>
zhawari at hotmail dot com
19-Jan-2005 07:02
Here is how to convert UCS-2 numbers to UTF-8 numbers in hex:

function ucs2toutf8($str)
{
       for ($i=0;$i<strlen($str);$i+=4)
       {
               $substring1 = $str[$i].$str[$i+1];
               $substring2 = $str[$i+2].$str[$i+3];
 
               if ($substring1 == "00")
               {
                       $byte1 = "";
                       $byte2 = $substring2;
               }
               else
               {
                       $substring = $substring1.$substring2;
                       $byte1 = dechex(192+(hexdec($substring)/64));
                       $byte2 = dechex(128+(hexdec($substring)%64));
               }
               $utf8 .= $byte1.$byte2;
       }
       return $utf8;
}
 
echo strtoupper(ucs2toutf8("06450631062D0020"));

?>

Input:
06450631062D
Output:
D985D8B1D8AD

regards,
Ziyad
bh
05-Jan-2005 04:10
Great class to convert between charsets: http://mikolajj.republika.pl/
SiMM
11-Dec-2004 03:15
<? // it's only example
function CP1251toUTF8($string){
 
$out = '';
  for (
$i = 0; $i<strlen($string); ++$i){
  
$ch = ord($string{$i});
   if (
$ch < 0x80) $out .= chr($ch);
   else
     if (
$ch >= 0xC0)
       if (
$ch < 0xF0)
            
$out .= "\xD0".chr(0x90 + $ch - 0xC0); // &#1040;-&#1071;, &#1072;-&#1087; (A-YA, a-p)
      
else $out .= "\xD1".chr(0x80 + $ch - 0xF0); // &#1088;-&#1103; (r-ya)
    
else
       switch(
$ch){
         case
0xA8: $out .= "\xD0\x81"; break; // YO
        
case 0xB8: $out .= "\xD1\x91"; break; // yo
         // ukrainian
        
case 0xA1: $out .= "\xD0\x8E"; break; // &#1038; (U)
        
case 0xA2: $out .= "\xD1\x9E"; break; // &#1118; (u)
        
case 0xAA: $out .= "\xD0\x84"; break; // &#1028; (e)
        
case 0xAF: $out .= "\xD0\x87"; break; // &#1031; (I..)
        
case 0xB2: $out .= "\xD0\x86"; break; // I (I)
        
case 0xB3: $out .= "\xD1\x96"; break; // i (i)
        
case 0xBA: $out .= "\xD1\x94"; break; // &#1108; (e)
        
case 0xBF: $out .= "\xD1\x97"; break; // &#1111; (i..)
         // chuvashian
        
case 0x8C: $out .= "\xD3\x90"; break; // &#1232; (A)
        
case 0x8D: $out .= "\xD3\x96"; break; // &#1238; (E)
        
case 0x8E: $out .= "\xD2\xAA"; break; // &#1194; (SCH)
        
case 0x8F: $out .= "\xD3\xB2"; break; // &#1266; (U)
        
case 0x9C: $out .= "\xD3\x91"; break; // &#1233; (a)
        
case 0x9D: $out .= "\xD3\x97"; break; // &#1239; (e)
        
case 0x9E: $out .= "\xD2\xAB"; break; // &#1195; (sch)
        
case 0x9F: $out .= "\xD3\xB3"; break; // &#1267; (u)
      
}
  }
  return
$out;
}
?>
aissam at yahoo dot com
30-Nov-2004 12:20
For those who have troubles in displaying UCS-2 data on browser, here's a simple function that convert ucs2 to html unicode entities :

<?php

 
function ucs2html($str) {
  
$str=trim($str); // if you are reading from file
  
$len=strlen($str);
  
$html='';
   for(
$i=0;$i<$len;$i+=2)
      
$html.='&#'.hexdec(dechex(ord($str[$i+1])).
                  
sprintf("%02s",dechex(ord($str[$i])))).';';
   return(
$html);
 }
?>
nikolai-dot-zujev-at-gmail-dot-com
18-Nov-2004 05:14
Here is an example how to convert windows-1251 (windows) or cp1251(Linux/Unix) encoded string to UTF-8 encoding.

<?php
function cp1251_utf8( $sInput )
{
  
$sOutput = "";

   for (
$i = 0; $i < strlen( $sInput ); $i++ )
   {
      
$iAscii = ord( $sInput[$i] );

       if (
$iAscii >= 192 && $iAscii <= 255 )
          
$sOutput .=  "&#".( 1040 + ( $iAscii - 192 ) ).";";
       else if (
$iAscii == 168 )
          
$sOutput .= "&#".( 1025 ).";";
       else if (
$iAscii == 184 )
          
$sOutput .= "&#".( 1105 ).";";
       else
          
$sOutput .= $sInput[$i];
   }
  
   return
$sOutput;
}
?>
vitek at 4rome dot ru
16-Nov-2004 03:53
On some systems there may be no such function as iconv(); this is due to the following reason: a constant is defined named `iconv` with the value `libiconv`. So, the string PHP_FUNCTION(iconv) transforms to PHP_FUNCTION(libiconv), and you have to call libiconv() function instead of iconv().
I had seen this on FreeBSD, but I am sure that was a rather special build.
If you'd want not to be dependent on this behaviour, add the following to your script:
<?php
if (!function_exists('iconv') && function_exists('libiconv')) {
   function
iconv($input_encoding, $output_encoding, $string) {
       return
libiconv($input_encoding, $output_encoding, $string);
   }
}
?>
Thanks to tony2001 at phpclub.net for explaining this behaviour.
ng4rrjanbiah at rediffmail dot com
22-Jun-2004 11:10
Here is a code to convert ISO 8859-1 to UTF-8 and vice versa without using iconv.

<?php
//Logic from http://twiki.org/cgi-bin/view/Codev/InternationalisationUTF8
$str_iso8859_1 = 'foo in ISO 8859-1';
//ISO 8859-1 to UTF-8
$str_utf8 = preg_replace("/([\x80-\xFF])/e",
          
"chr(0xC0|ord('\\1')>>6).chr(0x80|ord('\\1')&0x3F)",
            
$str_iso8859_1);
//UTF-8 to ISO 8859-1
$str_iso8859_1 = preg_replace("/([\xC2\xC3])([\x80-\xBF])/e",
              
"chr(ord('\\1')<<6&0xC0|ord('\\2')&0x3F)",
                
$str_utf8);
?>

HTH,
R. Rajesh Jeba Anbiah
Igu4n4 at example dot com
06-Jul-2003 12:03
Maybe I was a fool in placing the charset definition string as ISO8859-1 instead of ISO-8859-1 (note the - after ISO) but it worked in PHP 4.3. When I ported the system back to 4.2.2 iconv gave back an empty string without error messages. So beware in PHP 4.2.2 use allways the ISO-88....  charset definition.