strip_tags

add a note User Contributed Notes

anonymous
01-Nov-2006 08:52


A different approach to cleaning up HTML would be to first escape all unsafe characters:

& to &amp;

< to &lt;

> to &gt;

then to unescape matching pairs of tags back (e.g. "&lt;b&gt;hello&lt;/b&gt;" => "<b>hello</b>"), if it is identified safe. 



This backwards-approach should be safer because if a tag is not identified correctly, it is, at the end, in an escaped state. 



So if a user enters invalid html, or tags that are unsupported or unwanted, they are shown in plain text, and not stripped away. This is good, because the characters "<" and ">" might have been used in a different way (e.g. to make a text arrow: "a <=> b"). 

This is the case in most forums (apart from the fact that they use "[tag]"-tags instead of "<tag>"-tags)

pierresyraud at hotmail dot com
06-Oct-2006 03:43


A function inverse of, for strip any text and keep html tags !!!



function strip_text($a){

$i=-1;$n='';$ok=1;

while(isset($a{++$i})){

    if($ok&&$a{$i}!='<'){continue;}

    elseif($a{$i}=='>'){$ok=1;$n.='>';continue;}

    elseif($a{$i}=='<'){$ok=0;}

    if(!$ok){$n.=$a{$i};}}

  return $n;}

magdolen at elepha dot info
01-Oct-2006 10:24


i edited strip_selected_tags function that salavert created to strip also single tags (xhtml only)



here it is also with metric modification:



function strip_selected_tags($text, $tags = array()) {

    $args = func_get_args();

    // metric edit

    $text = preg_replace("/\r\n|\n|\r/","",array_shift($args));

    $tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;

    

    foreach ($tags as $tag){

        if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){

            $text = str_replace($found[0],$found[1],$text);

        }

        // hrax edit

        if(preg_match_all('/<'.$tag.'.*\/>/iU', $text, $found)){

            $text = str_replace($found[0], "", $text);

        }

    }

    

    return $text;

}

jausions at php dot net
19-Sep-2006 02:57


To sanitize any user input, you should also consider PEAR's HTML_Safe package.



http://pear.php.net/package/HTML_Safe

bfmaster_duran at yahoo dot com dot br
14-Sep-2006 09:32


I made this function with regular expression to remove some style properties from tags based in  other exaples here ;D

<?

function removeAttributes($htmlText)

{

       $stripAttrib = "'\\s(class)=\"(.*?)\"'i"; //remove classes from html tags;

       $htmlText = stripslashes($htmlText);

       $htmlText = preg_replace($stripAttrib, '', $htmlText);

       $stripAttrib = "/(font\-size|color|font\-family|line\-height):\\s".

              "(\\d+(\\x2E\\d+\\w+|\\W)|\\w+)(;|)(\\s|)/i";

//remove font-style,color,font-family,line-height from style tags in the text;

       $htmlText = stripslashes($tagSource);

       $htmlText = preg_replace($stripAttrib, '', $htmlText);

       $htmlText = str_replace(" style=\"\"", '', $htmlText); //remove empty style tags, after the preg_replace above (style="");

       return $htmlText;

}

function removeEvilTags($source)

{

   return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);

}

?>



Usage:

<?



$text = '<p style="line-height: 150%; font-weight: bold" class="MsoNormal"><span style="font-size: 10.5pt; line-height: 150%; font-family: Verdana">Com o compromisso de pioneirismo e aprimoramento, caracter&iacute;sticas da Oftalmocl&iacute;nica, novos equipamentos foram adquiridos para exames e diagn&oacute;sticos ainda mais precisos:</span></p>'; //This text is in brazillian portuguese ;D



echo htmlentities(removeEvilTags($text))."\r\n";



//This is return: <p style="font-weight: bold"><span>Com o compromisso de pioneirismo e aprimoramento, caracter&iacute;sticas da Oftalmocl&iacute;nica, novos equipamentos foram adquiridos para exames e diagn&oacute;sticos ainda mais precisos:</span></p>



?>



W0oT ! This is fantastic !



If you find an error, please report me to my mail ;D



(Y)

metric at 152 dot org
11-Aug-2006 02:46


I tried using the strip_selected_tags function that salavert created. It works really well for one line text, but if you have hard returns in the text it can't find the other tag.



I altered the line where it shifts the text into a variable to replace on OS line returns. 

$text = preg_replace("/\r\n|\n|\r/","",array_shift($args));

admin at automapit dot com
10-Aug-2006 01:01


<?

function html2txt($document){

$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript

               '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags

               '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly

               '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA

);

$text = preg_replace($search, '', $document);

return $text;

}

?>



This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.



It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!

09-Aug-2006 05:08


<?

function html2txt($document){

$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript

                '@<[\\/\\!]*?[^<>]*?>@si',            // Strip out HTML tags

                '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly

                '@<![\\s\\S]*?--[ \\t\\n\\r]*>@'          // Strip multi-line comments including CDATA 

);

$text = preg_replace($search, '', $document);

return $text;

}

?>



This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.



It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!

elgios at gmail dot com
06-Aug-2006 03:33


I think that the new function works, but don't remove PHP tags, only html!!



<?php

function theRealStripTags2($string)

{



   $tam=strlen($string);

   // tam have number of cars the string



   $newstring="";

   // newstring will be returned



   $tag=0;

   /* if tag = 0 => copy car from string to newstring

       if tag > 0 => don't copy. Found one or more  '<' and need

       to search '>'. If we found 3 '<' need to find all the 3 '>'

   */



   /* I am C programmer. walk in a string is natural for me and more efficient

   */

   for ($i=0; $i < $tam; $i++){

       // If I found one '<', $tag++ and continue whithout copy

       if ($string{$i} == '<'){

           $tag++;

           continue;

       }



       // if I found '>', decrease $tag and continue 

       if ($string{$i} == '>'){

           if ($tag){

               $tag--;

           }

       /* $tag never be negative. If string is "<b>test</b>>"

           (error, of course) $tag will stop in 0

       */

           continue;

       }



       // if $tag is 0, can copy 

       if ($tag == 0){

           $newstring .= $string{$i}; // simple copy, only one car

       }

   }

    return $newstring;

}



echo theRealStripTags2("<tag>test</tag>");

// return  "test"



?>

elgios at gmail dot com
04-Aug-2006 11:24


I think that new function works. 



function theRealStripTags2($string)

{



    $tam=strlen($string);

    // tam have number of cars the string



    $newstring="";

    // newstring will be returned



    $tag=0;

    /* tag = 0 => copy car from string to newstring

       tag > 0 => don't copy. Find one or mor tag '<' and

          need to find '>'. If we find 3 '<' need to find

          all 3 '>'

    */



    /* I am C programm. seek in a string is natural for me

        and more efficient



        Problem: copy a string to another string is more

        efficient but use more memory!!!

    */

    for ($i=0; $i < $tam; $i++){



        /* If I find one '<', $tag++ and continue whithout copy*/

        if ($string{$i} == '<'){

            $tag++;

            continue;

        }



        /* if I find '>', decrease $tag and continue */

        if ($string{$i} == '>'){

            if ($tag){

                $tag--;

            }

        /* $tag never be negative. If string is "<b>test</b>>" (error, of course)

            $tag stop in 0

        */

            continue;

        }



        /* if $tag is 0, can copy */

        if ($tag == 0){

            $newstring .= $string{$i}; // simple copy, only car

        }

    }

        return $newstring;

}

Sbastien
24-May-2006 11:22


hum, it seems that your function "theRealStripTags" won't have the right behavior in some cases, for example:



<?php

theRealStripTags("<!-- I want to put a <div>tag</div> -->");

theRealStripTags("<!-- Or a carrot > -->");

theRealStripTags("<![CDATA[what about this! It's to protect from HTML characters like <tag>, > and so on in XML, no?]]> -->");

?>

xyexz at yahoo dot com
09-May-2006 11:41


I have found with this function that sometimes it will only remove the first carrot from a tag and leave the rest of the tag in the string, which obviously isn't what I'm looking for.



EX: 

<?php



//Returns "tag>test/tag>"

echo strip_tags("<tag>test</tag>");



?>



I'm trying to strip_tags on a string I'm importing from xml so perhaps it has something to do with that but if you've run into this same issue I've written a function to fix it once and for all!



<?php



function theRealStripTags($string)

{

    //while there are tags left to remove

    while(strstr($string, '>'))

    {

        //find position of first carrot

        $currentBeg = strpos($string, '<');

        

        //find position of end carrot

        $currentEnd = strpos($string, '>');

        

        //find out if there is string before first carrot

        //if so save it in $tmpstring

        $tmpStringBeg = @substr($string, 0, $currentBeg);

        

        //find out if there is string after last carrot

        //if so save it in $tmpStringEnd

        $tmpStringEnd = @substr($string, $currentEnd + 1, strlen($string));

        

        //cut the tag from the string

        $string = $tmpStringBeg.$tmpStringEnd;

    }

        

    return $string;

}



//Returns "test"

echo theRealStripTags('<tag>test</tag>');



?>

soapergem at gmail dot com
29-Apr-2006 12:21


In my prior comment I made a mistake that needs correcting. Please change the forward slashes that begin and terminate my regular expression to a different character, like the at-sign (@), for instance. Here's what it should read:



$regex  = '@</?\w+((\s+\w+(\s*=\s*';

$regex .= '(?:".*?"|\'.*?\'|[^\'">\s]+))?)+';

$regex .= '\s*|\s*)/?>@i';



(There were forward-slashes embedded in the regular expression itself, so using them to begin and terminate the expression would have caused a parse error.)

JeremysFilms.com
08-Apr-2006 04:57


A simple little function for blocking tags by replacing the '<' and '>' characters with their HTML entities.  Good for simple posting systems that you don't want to have a chance of stripping non-HTML tags, or just want everything to show literally without any security issues:



<?php



function block_tags($string){

    $replaced_string = str_ireplace('<','&lt',$string);

    $replaced_string = str_ireplace('>','&gt',$replaced_string);

    return $replaced_string;

}



echo block_tags('<b>HEY</b>'); //Returns &ltb&gtHEY&lt/b&gt



?>

cesar at nixar dot org
08-Mar-2006 03:44


Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.



<?php

function strip_tags_deep($value)

{

  return is_array($value) ?

    array_map('strip_tags_deep', $value) :

    strip_tags($value);

}



// Example

$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));

$array = strip_tags_deep($array);



// Output

print_r($array);

?>

debug at jay dot net
24-Feb-2006 06:24


If you wish to steal quotes:

$quote=explode( "\n",

str_replace(array('document.writeln(\'','\')',';'),'',

strip_tags(

file_get_contents('http://www.quotationspage.com/data/1mqotd.js')

)

)

);

use $quote[2] & $quote[3]

It gives you a quote a day

balluche AROBASE free.fr
18-Feb-2006 06:16


//balluche:22/01/04:Remove even bad tags

function strip_bad_tags($html)

{

    $s = preg_replace ("@</?[^>]*>*@", "", $html);

    return $s;

}

salavert at~ akelos
13-Feb-2006 06:21


<?php

       /**

    * Works like PHP function strip_tags, but it only removes selected tags.

    * Example:

    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert

    */



    function strip_selected_tags($text, $tags = array())

    {

        $args = func_get_args();

        $text = array_shift($args);

        $tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;

        foreach ($tags as $tag){

            if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){

                $text = str_replace($found[0],$found[1],$text);

          }

        }



        return $text;

    }



?>



Hope you find it useful,



Jose Salavert

webmaster at tmproductionz dot com
02-Feb-2006 11:28


<?php



function remove_tag ( $tag , $data ) {

    

    while ( eregi ( "<" . $tag , $data ) ) {

        

        $it    = stripos ( $data , "<" . $tag   ) ;

                

        $it2   = stripos ( $data , "</" . $tag . ">" ) + strlen ( $tag ) + 3 ;

                

        $temp  = substr ( $data , 0    , $it  ) ;

    

        $temp2 = substr ( $data , $it2 , strlen ( $data ) ) ;

        

        $data = $temp . $temp2 ;

            

    }

    

    return $data ;

    

}



?>



this code will remove only and all of the specified tag from a given haystack.

lucahomer at hotmail dot com
30-Jan-2006 09:42


I think the Regular expression posted <a href=function.strip-tags.php#51383>HERE</a>  is not correct



<?php

$disalowedtags = array("font");



foreach ($_GET as $varname) 

foreach ($disalowedtags as $tag) 



----------------------------------------------------------

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) <---

----------------------------------------------------------



die("stop that");



?> 



this function also replaces  links like this :

<a href=font.php>test</a> 

because word "font" is between tags "<" ">".



I changed reg exp with this

-----------------------------------------------------

if (eregi("(<|</)".$tag."*\"?[^>]*>", $varname))

-----------------------------------------------------



bye 



Luca

Nyks
12-Oct-2005 04:39


Note for BRYN at drumdatabse dot com (http://www.php.net/manual/fr/function.strip-tags.php#52085) :



I've changed your script to support more possibilities.

- The first WHILE loop reiterates the second WHILE to strip_tags the html tags which possibly are cuted by the substr() function (and not recognized by the strip_tags() function)

- There's no more bugs with substr($textstring,0,1024) ... yes, when the WHILE loop reiterates for the second, third, fourth... time, if the length of $textstring is smaller than 1024 it returns error



<?php

function strip_tags_in_big_string($textstring){

while($textstring != strip_tags($textstring))

    {

    while (strlen($textstring) != 0)

         {

         if (strlen($textstring) > 1024) {

              $otherlen = 1024;

         } else {

              $otherlen = strlen($textstring);

         }

         $temptext = strip_tags(substr($textstring,0,$otherlen));

         $safetext .= $temptext;

         $textstring = substr_replace($textstring,'',0,$otherlen);

         }   

    $textstring = $safetext;

    }

return $textstring;

?>

info at christopher-kunz dot de
29-Aug-2005 09:34


Please note that the function supplied by daneel at neezine dot net is not a good way of avoiding XSS attacks. A string like 

<font size=">>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font> 

will be sanitized to 

<font>>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font>

which is a pretty good XSS.



If you are in need of XSS cleaning, you might want to consider the Pixel-Apes XSS cleaner: http://pixel-apes.com/safehtml

daneel at neezine dot net
23-Aug-2005 08:08


Remove attributes from a tag except the attributes specified, correction of cool routine from joris878 (who seems don't work) + example.

When PHP will going to support this natively ? 

Sorry for my english. Hope everybody understand.



--French--

Enlve des attributs d'une balise, sauf les attributs spcifis dans un tableau.

C'est une correction et un exemple de mise en oeuvre du code (trs utile) post par joris878 qui ne semblait pas fonctionner en l'tat.

Quand PHP supportera ceci de faon native ?

----------



<?

function stripeentag($msg,$tag,$attr) { 

  $lengthfirst = 0; 

  while (strstr(substr($msg,$lengthfirst),"<$tag ")!="") 

  { 

   $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag "); 

   $partafterwith = substr($msg,$imgstart); 

   $img = substr($partafterwith,0,strpos($partafterwith,">")+1); 

   $img = str_replace(" =","=",$msg); 

   $out = "<$tag";  



 for($i=0; $i <= (count($attr) - 1 );$i++) 

 { 

    $long_val = strpos($img," ",strpos($img,$attr[$i]."=")) - (strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1) ;

    $val = substr($img, strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1,$long_val);

     if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val; 

     else $attr[$i] = ""; 

     $out .= $attr[$i]; 

 } 



   $out .= ">"; 

   $partafter = substr($partafterwith,strpos($partafterwith,">")+1); 

   $msg = substr($msg,0,$imgstart).$out.$partafter; 

   $lengthfirst = $imgstart+3; 

  } 

  return $msg; 

} 



$message = "<font size=\"10\" face=\"tahoma\" color=\"#DD0000\" >salut</font>" ;



//on ne garde que la couleur

//we want only "color" attribute

$message = stripeentag($message,"font",array("color"));



echo $message ;

?>

11-Aug-2005 03:08


<?php

/**removes specifed tags from the text where each tag requires a 

     *closing tag and if the later

     *is not found then everything after will be removed

     *typical usage:

     *some html text, array('script','body','html') - all lower case*/

    public static function removeTags($text,$tags_array){

        $length = strlen($text);

        $pos =0;

        $tags_array = $array_flip($tags_array);

        while ($pos < $length && ($pos = strpos($text,'<',$pos)) !== false){

            $dlm_pos = strpos($text,' ',$pos);

            $dlm2_pos = strpos($text,'>',$pos);

            if ($dlm_pos > $dlm2_pos)$dlm_pos=$dlm2_pos;

            $which_tag = strtolower(substr($text,$pos+1,$dlm_pos-($pos+1)));

            $tag_length = strlen($srch_tag);

            if (!isset($tags_array[$which_tag])){

                //if no tag matches found

                ++$pos;

                continue;

            }

            //find the end

            $sec_tag = '</'.$which_tag.'>';

            $sec_pos = stripos($text,$sec_tag,$pos+$tag_length);

            //remove everything after if end of the tag not found

            if ($sec_pos === false) $sec_pos = $length-strlen($sec_tag);

            $rmv_length = $sec_pos-$pos+strlen($sec_tag);

            $text = substr_replace($text,'',$pos,$rmv_length);

            //update length

            $length = $length - $rmv_length;

            $pos++;

        }

        return $text;

    }

?>

erwin at spammij dot nl
08-Jul-2005 11:13


if you want to disable you can easyly replace all instances of < and > , which will make all HTML code not working.

php at scowen dot com
08-Jun-2005 03:50


I have had a similar problem to kangaroo232002 at yahoo dot co dot uk when stripping tags from html containing javascript. The javascript can obviously contain '>' and '<' as comparison operators which are seen by strip_tags() as html tags - leading to undesired results.



To christianbecke at web dot de - this can be third-party html, so although perhaps not always 'correct', that's how it is!

anonymous
28-May-2005 03:45


Someone can use attributes like CSS in the tags.

Example, you strip all tagw except <b> then a user can still do <b style="color: red; font-size: 45pt">Hello</b> which might be undesired.



Maybe BB Code would be something.

bazzy
23-Apr-2005 08:09


I think bryn and john780 are missing the point - eric at direnetworks wasn't suggesting there is an overall string limit of 1024 characters but rather that actual tags over 1024 characters long (eg, in his case it sounds like a really long encrypted <a href> tag) will fail to be stripped.



The functions to slowly pass strings through strip_tags 1024 characters at a time aren't necessary and are actually counter productive (since if a tag spans the break point, ie it is opened before the 1024 characters and closed after the 1024 characters then only the opening tag is removed which leaves a mess of text up to the closing tag).



Only mentioning this as I spent ages working out a better way to deal with this character spanning before I actually went back and read eric's post and realised the subsequent posts were misleading - hopefully it'll save others the same headaches :)

bryn -at- drumdatabase dot net
21-Apr-2005 05:38


Further to john780's idea for a solution to the 1024 character limit of strip_tags - it's a good one, but I think the ltrim function isn't the one for the job? I wrote this simple function to get around the limit (I'm a newbie, so there may be some problem / better way of doing it!):



<?

function strip_tags_in_big_string($textstring){

    while (strlen($textstring) != 0)

        {

        $temptext = strip_tags(substr($textstring,0,1024));

        $safetext .= $temptext;

        $textstring = substr_replace($textstring,'',0,1024);

        }    

    return $safetext;

}

?>



Hope someone finds it useful.

cz188658 at tiscali dot cz
08-Apr-2005 04:21


If you want to remove XHTML tags like <br /> (single pair tags), as an allowable_tags parametr you must include tag <br>

Jiri

php at arzynik dot com
29-Mar-2005 08:04


instead of removing tags that you dont want, sometimes you might want to just stop them from doing anything.



<?php

$disalowedtags = array("script",

                        "object",

                        "iframe",

                        "image",

                        "applet",

                        "meta",

                        "form",

                        "onmouseover",

                        "onmouseout");



foreach ($_GET as $varname) 

foreach ($disalowedtags as $tag) 

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) 

die("stop that");



foreach ($_POST as $varname) 

foreach ($disalowedtags as $tag) 

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) 

die("stop that");



?>

christianbecke at web dot de
16-Feb-2005 10:34


to kangaroo232002 at yahoo dot co dot uk:



As far as I understand, what you report is not a bug in strip_tags(), but a bug in your HTML.

You should use alt='Go &gt;' instead of alt='Go >'.



I suppose your HTML diplays allright in browsers, but that does not mean it's correct. It just shows that browsers are more graceful concerning characters not properly escaped as entities than strip_tags() is.

kangaroo232002 at yahoo dot co dot uk
03-Feb-2005 09:23


After wondering why the following was indexed in my trawler despite stripping all text in tags (and punctuation) " valign left align middle border 0 src go gif name search1 onclick search", please take a quick look at what produced it: <DIV style="position: absolute; TOP:22%; LEFT:68%;"><input type="image" alt="Go >" valign="left" align="middle" border=0 src="go.gif" name="search1" onClick="search()"></div>...



looking at this closely, it is possible to see that despite the 'Go >' statement being enclosed in speech marks (with the right facing chevron), strip_tags() still assumes that it is the end of the input statement, and treats everything after as text. Not sure if this has been fixed in later versions; im using v4.3.3...



good hunting.

jon780 -at- gmail.com
03-Feb-2005 01:18


To eric at direnetworks dot com regarding the 1024 character limit:



You could simply ltrim() the first 1024 characters, run them through strip_tags(), add them to a new string, and remove them from the first.



Perform this in a loop which continued until the original string was of 0 length.

dumb at coder dot com
17-Jan-2005 08:22


/*

15Jan05



Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters: 

&lt; shows as <    --    &amp; shows as &    --    etc.



Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING.  There's no way to change this, making it difficult to edit html in a <textarea>



"HTML Entity Codes" (ie, use of &#60 to represent "<", &#38 to represent "&" &#160 to represent "&nbsp;") can be used instead.  Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>.



how would I get a textarea to contain "&lt;" as a literal string of characters and not have it display a "<"

&amp;lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;amp;lt;'. That's just how HTML entities work.



htmlspecialchars() is a subset of htmlentities()

the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode()



google on ns_quotehtml and see http://aolserver.com/docs/tcl/ns_quotehtml.html

see also http://www.htmlhelp.com/reference/html40/entities/

*/

eric at direnetworks dot com
21-Dec-2004 10:36


the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024.  If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag).   I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long.  I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.

ashley at norris dot org dot au
01-Nov-2004 11:11


leathargy at hotmail dot com wrote:



"it seems we're all overlooking a few things:

1) if we replace "</ta</tableble>" by removing </table, we're not better off..."



I beat this by using ($input contains the data):



<?php

while($input != strip_tags($input)) {

            $input = strip_tags($input);

        }

?>



This iteratively strips tags until all tags have gone :)

@dada
29-Sep-2004 08:41


if you  only want to have the text within the tags, you can use this function:



function showtextintags($text)



{



$text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text");

$text = strip_tags($text);

$text = str_replace("<!--", "&lt;!--", $text);

$text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text);



return $text;



}



it will show all the text without tags and (!!!) without javascripts

Anonymous User
23-Aug-2004 12:24


Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.



For example, 



"<strong>This is a bit of text</strong><p />Followed by this bit"



are seperable paragraphs on a visual plane, but if simply stripped of tags will result in



"This is a bit of textFollowed by this bit"



which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.



The workaround is to force whitespace prior to stripping, using something like this:



      $text = getTheText();

      $text = preg_replace('/</',' <',$text);

      $text = preg_replace('/>/','> ',$text);

      $desc = html_entity_decode(strip_tags($text));

      $desc = preg_replace('/[\n\r\t]/',' ',$desc);

      $desc = preg_replace('/  /',' ',$desc);

Isaac Schlueter php at isaacschlueter dot com
17-Aug-2004 10:32


steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments.  With this function, you can.  Just pass <!--> as one of the allowed tags.  Easy as pie: just pull them out, strip, and then put them back.



<?php

function strip_tags_c($string, $allowed_tags = '')

{    

    $allow_comments = ( strpos($allowed_tags, '<!-->') !== false );

    if( $allow_comments ) 

    {

        $string = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $string);

        $allowed_tags = str_replace('<!-->', '', $allowed_tags);

    }

    $string = strip_tags( $string, $allowed_tags );

    if( $allow_comments ) $string = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $string);

    return $string;

}

?>

Isaac Schlueter php at isaacschlueter dot com
16-Aug-2004 02:16


I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags.  The problem is that if you have something like this:

<a href="http://www.php.net" title="PHP is cool!">PHP</a>



then the plugin will mangle it into:



<a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a>



This function will strip out tags that occur within other tags.  Not super-useful in tons of situations, but it was an interesting puzzle.  I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag.



The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags.



<?php

if( !function_exists( 'antiTagInTag' ) )

{

    // $content is the string to be anti-tagintagged, and $format sets the format of the internals.

    function antiTagInTag( $content = '', $format = 'htmlhead' )

    {

        if( !function_exists( 'format_to_output' ) ) 

        {    // Use the external function if it exists, or fall back on just strip_tags.

            function format_to_output($content, $format)

            {

                return strip_tags($content);

            }

        }

        $contentwalker = 0;

        $length = strlen( $content );

        $tagend = -1;

        for( $tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) )

        {

            // got the start of a tag.  Now find the proper end!

            $walker = $tagstart + 1;

            $open = 1;

            while( $open != 0 && $walker < strlen( $content ) )

            {

                $nextopen = strpos( $content, '<', $walker );

                $nextclose = strpos( $content, '>', $walker );

                if( $nextclose === false )

                {    // ERROR! Open waka without close waka!

                    // echo '<code>Error in antiTagInTag - malformed tag!</code> ';

                    return $content;

                }

                if( $nextopen === false || $nextopen > $nextclose )

                { // No more opens, but there was a close; or, a close happens before the next open.

                    // walker goes to the close+1, and open decrements

                    $open --;

                    $walker = $nextclose + 1;

                }

                elseif( $nextopen < $nextclose )

                { // an open before the next close

                    $open ++;

                    $walker = $nextopen + 1;

                }

            }

            $tagend = $walker;

            if( $tagend > strlen( $content ) ) 

                $tagend = strlen( $content );

            else

            {

                $tagend --;

                $tagstart ++;

            }

            $tag = substr( $content, $tagstart, $tagend - $tagstart );

            $tags[] = '<' . $tag . '>';

            $newtag = format_to_output( $tag, $format );

            $newtags[] = '<' . $newtag . '>';

            $newtag = format_to_output( $tag, $format );

        }

        

        $content = str_replace($tags, $newtags, $content);

        return $content;

    }

}

Tony Freeman
20-Nov-2003 06:45


This is a slightly altered version of tREXX's code.  The difference is that this one simply removes the unwanted attributes (rather than flagging them as forbidden).



function removeEvilAttributes($tagSource)

{

        $stripAttrib = "' (style|class)=\"(.*?)\"'i";

        $tagSource = stripslashes($tagSource);

        $tagSource = preg_replace($stripAttrib, '', $tagSource);

        return $tagSource;

}



function removeEvilTags($source)

{

    $allowedTags='<a><br><b><h1><h2><h3><h4><i>' .

             '<img><li><ol><p><strong><table>' .

             '<tr><td><th><u><ul>';

    $source = strip_tags($source, $allowedTags);

    return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);

}



$text = '<p style="Normal">Saluton el <a href="#?"

 class="xsarial">Esperanto-lando</a><img src="my.jpg"

 alt="Saluton" width=100 height=100></p>';



$text = removeEvilTags($text);



var_dump($text);

leathargy at hotmail dot com
27-Oct-2003 02:15


it seems we're all overlooking a few things:

1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work...

2) no browser treats { as <.[as far as i know]

3) because of statement 2, we can do:

$remove=array("<?","<","?>",">");

$change=array("{[pre]}","{[","{/pre}","]}");

$repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}");

// and so forth...



$repairChange("<pre>","</pre>","<b>","<b>","<br>");

// and so forth...



$maltags=array("{[","]}");

$nontags=array("{","}");

$unclean=...;//get variable from somewhere...

$unclean=str_replace($remove,$change,$unclean);

$unclean=str_replace($repairSeek, $repairChange, $unclean);

$clean=str_replace($maltags, $nontags, $unclean);



////end example....

4) we can further improve the above by using explode(for our ease):

function purifyText($unclean, $fixme)

{

$remove=array();

$remove=explode("\n",$fixit['remove']);

//... and so forth for each of the above arrays...

// or you could just pass the arrays..., or a giant string

//put above here...

return $clean

}//done

tREXX [www.trexx.ch]
15-Oct-2003 09:15


Here's a quite fast solution to remove unwanted tags AND also unwanted attributes within the allowed tags:



<?php

/**

 * Allow these tags

 */

$allowedTags = '<h1><b><i><a><ul><li><pre><hr><blockquote><img>';



/**

 * Disallow these attributes/prefix within a tag

 */

$stripAttrib = 'javascript:|onclick|ondblclick|onmousedown|onmouseup|onmouseover|'.

               'onmousemove|onmouseout|onkeypress|onkeydown|onkeyup';



/**

 * @return string

 * @param string

 * @desc Strip forbidden tags and delegate tag-source check to removeEvilAttributes()

 */

function removeEvilTags($source)

{

    global $allowedTags;

    $source = strip_tags($source, $allowedTags);

    return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);

}



/**

 * @return string

 * @param string

 * @desc Strip forbidden attributes from a tag

 */

function removeEvilAttributes($tagSource)

{

    global $stripAttrib;

    return stripslashes(preg_replace("/$stripAttrib/i", 'forbidden', $tagSource));

}



// Will output: <a href="forbiddenalert(1);" target="_blank" forbidden =" alert(1)">test</a>

echo removeEvilTags('<a href="javascript:alert(1);" target="_blank" onMouseOver = "alert(1)">test</a>');

?>

dougal at gunters dot org
11-Sep-2003 04:03


strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like:



$html = str_replace('<!DOCTYPE','<DOCTYPE',$html);



before processing with strip_tags().

joris878 at hotmail dot com
04-Jun-2003 08:58


[   Editor's Note: This functionality will be natively supported in a future release of PHP.  Most likely 5.0   ]





This routine removes all attributes from a given tag except


the attributes specified in the array $attr.





function stripeentag($msg,$tag,$attr) {


  $lengthfirst = 0;


  while (strstr(substr($msg,$lengthfirst),"<$tag ")!="")


  {


    $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag ");


    $partafterwith = substr($msg,$imgstart);


    $img = substr($partafterwith,0,strpos($partafterwith,">")+1);


    $img = str_replace(" =","=",$msg);


    $out = "<$tag";  


    for($i=1;$i<=count($atr);$i++)


    {


      $val = filter($img,$attr[$i]."="," ");


      if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val;


      else $attr[$i] = "";


      $out .= $attr[$i];


    }


    $out .= ">";


    $partafter = substr($partafterwith,strpos($partafterwith,">")+1);


    $msg = substr($msg,0,$imgstart).$out.$partafter;


    $lengthfirst = $imgstart+3;


  }


  return $msg;


}

Chuck
21-Mar-2003 08:01


Caution, HTML created by Word may contain the sequence 

'<?xml...' 



Apparently strip_slashes treats this like <?php and removes the remainder of the input string. Not the just the XML tag but all input that follows.

dontknowwhat at thehellIamdoing dot com
20-Nov-2002 10:23


Here's a quickie that will strip out only specific tags. I'm using it to clean up Frontpage and WORD code from included third-party code (which shouldn't have the all the extra header information in it).



$contents = "Your HTML string";



// Part 1

// This array is for single tags and their closing counterparts



$tags_to_strip = Array("html","body","meta","link","head");



foreach ($tags_to_strip as $tag) {

       $contents = preg_replace("/<\/?" . $tag . "(.|\s)*?>/","",$contents);

}



// Part 2

// This array is for stripping opening and closing tags AND what's in between



$tags_and_content_to_strip = Array("title");



foreach ($tags_and_content_to_strip as $tag) {

       $contents = preg_replace("/<" . $tag . ">(.|\s)*?<\/" . $tag . ">/","",$contents);

}

mrmaxxx333 at triad dot rr dot com
08-May-2002 02:29


to rid everything in between script tags, including the script tags, i use this.





<?php


$description = ereg_replace("~<script[^>]*>.+</script[^>]*>~isU", "", $description);


?>





it hasn't been extensively tested, but it works.





also, i ran into trouble with a href tags. i wanted to strip out the url in them. i did this to turn an <a href="blah.com">welcome to blah</a> into welcome to blah (blah.com)





<?php


$string = preg_replace('/<a\s+.*?href="([^"]+)"[^>]*>([^<]+)<\/a>/is', '\2 (\1)', $string);


?>

guy at datalink dot SPAMMENOT dot net dot au
15-Mar-2002 02:19


Strip tags will NOT remove HTML entities such as &nbsp;

chrisj at thecyberpunk dot com
19-Dec-2001 04:57


strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:





$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);

add a note

strip_tags

Description