preg_match

add a note User Contributed Notes

sam at NOSPAM dot aigc dot net
04-Nov-2006 05:10


Here's something I made awhile ago to colorize long regular expressions. I can't guarantee it'll work for everything/everyone, but it helps me a lot and might help someone else.



Usage:

<?php echo highlight_regexp("/^[0-9]{2}:[0-9]{2}[apAP]$/"); ?>



<?php

function highlight_regexp($pattern) {

    $colors = array(

        "/" => "red",

        "(" => "green",

        ")" => "green",

        "[" => "blue",

        "]" => "blue",

        "{" => "orange",

        "}" => "orange"

    );

    $specialchars = array("?","+","*",".","|");

    $space = "&nbsp; &nbsp; ";

    for ($i = 0; $i < strlen($pattern); $i++) {

        unset($spacing);

        if ($skip) {

            $show = 1;

            $skip = 0;

        } else

            switch ($pattern{$i}) {

                case "/":

                case "(":

                case "[":

                case "{":

                    if ($skip) {

                        $show = 1;

                        $skip = 0;

                    } else {

                        $tier++;

                        if ($pattern{$i} == "/")

                            $tier = 0;

                        for ($j = 0; $j < $tier; $j++)

                            $spacing .= $space;

                        $pattern{$i} == "{" or $return .= "<br>$spacing";

                        $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>";

                        if ($pattern{$i} == "(")

                            $spaceover = "<br>$spacing$space";

                        else {

                            if ($pattern{$i} == "[")

                                $inbrackets = 1;

                            unset($spaceover);

                        }

                    }

                    $show = 0;

                    break;

                case ")":

                case "]":

                case "}":

                    if ($skip) {

                        $show = 1;

                        $skip = 0;

                    } else {

                        for ($j = 0; $j < $tier; $j++)

                            $spacing .= $space;

                        if ($pattern{$i} == ")")

                            $return .= "<br>$spacing";

                        elseif ($pattern{$i} == "]")

                            $inbrackets = 0;

                        $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>\n";

                        $spaceover = "<br>$spacing";

                        $tier--;

                    }

                    $show = 0;

                    break;

                default:

                    $show = 1;

                    break;

            }

            if ($show) {

                if (!$inbrackets && in_array($pattern{$i},$specialchars)) {

                    $skipspaceover = 1 ;

                    $preextra = "<font style='font-weight:bold;color:red'>";

                    $postextra = "</font>";

                    $replace = "";

                } elseif ($pattern{$i} == " ") {

                    $preextra = "<i style='font-size:10px'>";

                    $replace = "(space)";

                    $postextra = "</i>";

                } else

                    $preextra = $postextra = $replace = $skipspaceover = "";

                if ($spaceover && !$skipspaceover) {

                    $return .= $spaceover;

                    unset($spaceover);

                }

                $return .= $preextra.($replace ? $replace : $pattern{$i}).$postextra;

            }

    }

    return $return;

}

?>

18-Oct-2006 08:09


While reading these notes I noticed many IP-matching patterns that seemed to be missing a detail. Most use this pattern...



25[1-5] | 2[1-4]\d | [01]\d{2} | \\d{1,2}



... for matching each digit group to make sure not to match anything above 255. If we have something looking like an IP starting with 260 it can however match the 60.



So, in the beginning, if matching less than 3 digits, make sure there is no digit before: (?<=[^\d]|^)

And at the end, make sure there is no digit following: (?=[^\d]|$)



<?php

$ip_pattern = "/((25[1-5]|2[1-4]\\d|[01]\\d{2}|(?<=[^\\d]|^)\\d{1,2})\\.){3}".

    "(25[1-5]|2[1-4]\\d|[01]\\d{2}|\\d{1,2})(?=[^\\d]|$)/";

?>



But generally it takes less time (coding and executing) to just capture anything looking like an IP and weed out the invalid ones with some simple function using array_walk(). A lot more flexible as well.

cp at ltur dot de
14-Aug-2006 08:56


To Stabby at somewhere dot invalid,



i guess you're not very familar with regexps. Here is the working version:



<?php

$pattern='/fred(.+)bloggs/isU';

$data="fred hello\nthere bloggs fred goodbye bloggs";

preg_match_all($pattern,$data,$repTxt,PREG_PATTERN_ORDER);



print_r($repTxt);

?>



1. You have to set the 's' modifier, to make the '.' match newlines (see documentation)



2. if you write the capuring phrase like you did (.)+ it will capture a lot of single characters. If you change it to (.+) it will capture all allowed charakters.



HTH

Claus

Stabby at somewhere dot invalid
03-Aug-2006 09:30


Please note that multi-line matches do not work, regardless of the modifiers in php 4.3.x. example:



<?php

$pattern='/fred(.)+bloggs/iU';

$data="fred hello\nthere bloggs fred goodbye bloggs";

preg_match_all($pattern,$data,$repTxt,PREG_PATTERN_ORDER);



print_r($repTxt);

?>



Will print out:



Array ( [0] => Array ( [0] => fred goodbye bloggs ) [1] => Array ( [0] => ) )



(the \n newline breaks the first match). I just use a loop with strpos to get around this.

krzysztof at uno dot pl
03-Aug-2006 06:24


<?PHP

// GET all links from URL



function remove_html(&$item, $key)

{

   $item=trim(strip_tags($item));

}



function get_links($url) {

$preg = 

"/a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>"

."([^<]+|.*?)?<\/a>/";

    preg_match_all(trim($preg), 

           file_get_contents($url), $out, PREG_PATTERN_ORDER);

    $keys = $out[1];

    $values = $out[2];

    array_walk($values, 'remove_html');

    return (array_combine($keys, $values));

}



print_r(get_links("http://www.uno.pl"));



/*

Result:



array

(

    [/] =>

    [/downloads.php] => PHP 5.1.2

    [http://www.php.net/docs.php] => manual

    [http://www.uno.pl] => My link 1

    ...

)



*/



?>

mail at SPAMBUSTER at milianw dot de
17-Jul-2006 10:11


I refurnished connum at DONOTSPAMME dot googlemail dot com autoCloseTags function:

<?php

/**

 * close all open xhtml tags at the end of the string

 * 

 * @author Milian Wolff <http://milianw.de>

 * @param string $html

 * @return string

 */

function closetags($html){

  #put all opened tags into an array

  preg_match_all("#<([a-z]+)( .*)?(?!/)>#iU",$html,$result);

  $openedtags=$result[1];



  #put all closed tags into an array

  preg_match_all("#</([a-z]+)>#iU",$html,$result);

  $closedtags=$result[1];

  $len_opened = count($openedtags);

  # all tags are closed

  if(count($closedtags) == $len_opened){

    return $html;

  }

  $openedtags = array_reverse($openedtags);

  # close tags

  for($i=0;$i<$len_opened;$i++) {

    if (!in_array($openedtags[$i],$closedtags)){

      $html .= '</'.$openedtags[$i].'>';

    } else {

      unset($closedtags[array_search($openedtags[$i],$closedtags)]);

    }

  }

  return $html;

}

?>

volkank at developera dot com
07-Jul-2006 10:04


I will add some note about my last post.



Leading zeros in IP addresses can cause problems on both Windows and Linux, because one can be confused if it is decimal or octal (if octal not written properly)



"66.163.161.117" is in a decimal syntax but in "066.163.161.117" the first octet 066 is in octal syntax.

So "066.163.161.117" is recognized as  decimal "54.163.161.117" by the operating system.

BTW octal is alittle rare syntax so you may not want or need to match it.



***

Unless you specially want to match IP addresses including both decimal and octal syntax; you can use Chortos-2's pattern which is suitable for most conditions.



<?php 

//DECIMAL syntax IP match



//$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";

$num='(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])';



if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) //validate IP

...



preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match); //collect IP addresses from a text(notice that ^$ not present in pattern)

...



?> 



***

Also my previous pattern still have bug and needs some changes to correctly match both decimal and octal syntax.

connum at DONOTSPAMME dot googlemail dot com
04-Jun-2006 05:41


<?

function autoCloseTags($string) {

// automatically close HTML-Tags

// (usefull e.g. if you want to extract part of a blog entry or news as preview/teaser)

// coded by Constantin Gross <connum at googlemail dot com> / 3rd of June, 2006

// feel free to leave comments or to improve this function!



$donotclose=array('br','img','input'); //Tags that are not to be closed



//prepare vars and arrays

$tagstoclose='';

$tags=array();



//put all opened tags into an array

preg_match_all("/<(([A-Z]|[a-z]).*)(( )|(>))/isU",$string,$result);

$openedtags=$result[1];

$openedtags=array_reverse($openedtags); //this is just done so that the order of the closed tags in the end will be better



//put all closed tags into an array

preg_match_all("/<\/(([A-Z]|[a-z]).*)(( )|(>))/isU",$string,$result2);

$closedtags=$result2[1];



//look up which tags still have to be closed and put them in an array

for ($i=0;$i<count($openedtags);$i++) {

    if (in_array($openedtags[$i],$closedtags)) { unset($closedtags[array_search($openedtags[$i],$closedtags)]); }

        else array_push($tags, $openedtags[$i]);

}  



$tags=array_reverse($tags); //now this reversion is done again for a better order of close-tags



//prepare the close-tags for output

for($x=0;$x<count($tags);$x++) {

$add=strtolower(trim($tags[$x]));

if(!in_array($add,$donotclose)) $tagstoclose.='</'.$add.'>';

}



//and finally 

return $tagstoclose;

}

?>

slavomir dot hustaty at gmail dot com
28-Mar-2006 11:10


//<h1>some text</h1><b>bold</b><h1>some further text</h1>

//if needed what's between tags :-)



class find_regex 

{

    

    var $search_tag;

    var $result;

    //preg_match_all("/(<h1[^>]*>)([^<]*)(<\/h1>)/", $html, $matches);

    

    function find_regex($tag = "h1")

    {

        $this->search_tag = $tag;

    }

    

    function parse($text_to_parse = "")

    {

    

        $regex = "/(<" . $this->search_tag . "[^>]*>)([^<]*)(<\/" . $this->search_tag . ">)/";

    

        preg_match_all( $regex , $row->buffer_sk , $matches );

        

        $this->result = $matches;

        

        return $matches[2];

        

    }

    

}

dave at mixd dot net
23-Mar-2006 12:18


Use this to capture all JavaScript code that is between <script> tags.



Takes into account javascript that generates HTML. This one took a while, so I thought I'd share it.



$delimeter = 

'/<script[^>]*>((?:[^<>"\']+(?:"[^"]*"|\'[^\']*\')*)+)<\/script>/i';



Note: For some reason php.net is filtering out my escape characters... If it doesn't work make sure you escape all single quotes and the forward slash.

phpnet at sinful-music dot com
20-Feb-2006 04:53


Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!



<?php

function mime_extract_rfc2822_address($string)

{

        //rfc2822 token setup

        $crlf           = "(?:\r\n)";

        $wsp            = "[\t ]";

        $text           = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";

        $quoted_pair    = "(?:\\\\$text)";

        $fws            = "(?:(?:$wsp*$crlf)?$wsp+)";

        $ctext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" .

                          "!-'*-[\\]-\\x7F]";

        $comment        = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" .

                          "$fws?\\))";

        $cfws           = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))";

        //$cfws           = $fws; //an alternative to comments

        $atext          = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]";

        $atom           = "(?:$cfws?$atext+$cfws?)";

        $dot_atom_text  = "(?:$atext+(?:\\.$atext+)*)";

        $dot_atom       = "(?:$cfws?$dot_atom_text$cfws?)";

        $qtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]";

        $qcontent       = "(?:$qtext|$quoted_pair)";

        $quoted_string  = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)";

        $dtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]";

        $dcontent       = "(?:$dtext|$quoted_pair)";

        $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)";

        $domain         = "(?:$dot_atom|$domain_literal)";

        $local_part     = "(?:$dot_atom|$quoted_string)";

        $addr_spec      = "($local_part@$domain)";

        $display_name   = "(?:(?:$atom|$quoted_string)+)";

        $angle_addr     = "(?:$cfws?<$addr_spec>$cfws?)";

        $name_addr      = "(?:$display_name?$angle_addr)";

        $mailbox        = "(?:$name_addr|$addr_spec)";

        $mailbox_list   = "(?:(?:(?:(?<=:)|,)$mailbox)+)";

        $group          = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)";

        $address        = "(?:$mailbox|$group)";

        $address_list   = "(?:(?:^|,)$address)+";



        //output length of string (just so you see how f**king long it is)

        echo(strlen($address_list) . " ");



        //apply expression

        preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER);



        return $array;

};

?>

volkank at developera dot com
17-Feb-2006 03:23


Correct IP matching Pattern:



This is my new IP octet pattern seems to be correct:

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";



/*

25[0-5]    => 250-255

2[0-4]\d   => 200-249

[01]?\d\d  => 00-99,000-199

\d         => 0-9

*/



GRABBING multiple Valid IP addresses from string



<?

    $num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

    $test="127.0.0.112 10.0.0.2";

    preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);

    print_r($match);

      

?>



Single IP validation

<?

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

$ip_addr='009.111.111.100';

if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) echo "Wrong IP Address\\n";

echo $match[0];



?>

bgamrat at wirehopper dot com
13-Feb-2006 03:19


The double slashes in the following post should be replaced by single slashes.

bgamrat at wirehopper dot com
07-Feb-2006 11:54


I used these regular expressions to get the references from a page.   The function run_preg lists the references found.



$url = "http://test.com";

$text=@file_get_contents($url);

if ($text)

{

  $src_href_url=run_preg($text,

    "/(?:(?:src|href|url)\\s*[=\\(]\\s*[\\"'`])".

    "([\\+\\w:?=@&\\/#._;-]+)(?:[\\s\\"'`])/i");

  $windows=run_preg($text,

    "/(?:window.open\\s*\\(\\s*[\\w-]*\\s*[,]\\s*[\\"`'])".

    "([\\+\\w:?=@&\\/#._;-]*)(?:[\\"'`]\\s*)/i");

}



function run_preg($text,$pattern) {



   preg_match_all ($pattern, $text, $matches);



   if (count($matches)>0)

        if (count($matches[1])>0)

                foreach ($matches[1] as $k => $v)

                        echo "$k: $v\\n";



   return (is_array($matches)) ? $matches[1]:FALSE;

}



Thanks to http://us2.php.net/manual/en/function.preg-match.php#58505

for giving me a good starting point.



Hope others find this useful.  :)

mnc at u dot nu
03-Feb-2006 02:05


PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.

egingell at sisna dot com
01-Feb-2006 11:31


Try this for preg_match_all that takes an array of reg expers.



<?

// Emulates preg_match_all() but takes an array instead of a string.

// Returns an array containing all of the matches.

// The return array is an array containing the arrays normally returned by

//    preg_match_all() with the optional third parameter supplied.

function preg_search($ary, $subj) {

    $matched = array();

    if (is_array($ary)) {

        foreach ($ary as $v) {

            preg_match_all($v, $subj, $matched[]);

        }

    } else {

        preg_match_all($ary, $subj, $matched[]);

    }

    return $matched;

}

?>

18-Dec-2005 10:16


Two match all occurrences between and including any two HTML tags, here <tr> and </tr>



preg_match_all("/(\<[ \\n\\r\\t]{0,}tr[^>]*\>|\<[^>]*[\\n\\r\\t]{1,}tr[^>]*\>){1}

([^<]*<([^(\/>)]*(\/[^(t>)]){0,1}(\/t[^(r>)]){0,1})*>)*

(\<[ \\n\\r\\t]{0,}\/tr[^>]*\>|\<[^>]*[\\n\\r\\t]{1,}\/tr[^>]*\>){1}/i", $string, $Matches);

php at projectjj dot com
09-Dec-2005 04:43


Re: webmaster at swirldrop dot com



If you want to get a string with all the 'normal' characters, this may be better:



$clean = preg_replace('/\W+/', '', $dirty);



\W is the opposite of \w and will match any character that is not a letter or digit or the underscore character, plus it respects the current locale. Use [^0-9a-zA-Z]+ instead of \W if you need ASCII-only.

htp
08-Dec-2005 05:29


Just a quick note regarding the post by webmaster at swirldrop dot com.  The regex doesn't match alpha-numerics, as it doesn't actually match numerics, just alphas.  Might want to a add a 0-9 if that was the intend.

pablo dot seb at gmail dot com
16-Jun-2005 09:48


By assigning a name to a capturing group, you can easily reference it by name. (?P<name>group) captures the match of group into the backreference "name". You can reference the contents of the group with the numbered backreference or the named backreference 



<?php



preg_match_all('|(a)(?P<x>b)(?P<y>c)(d)|','abcdefgabcdefg',$sub);



echo $sub[2][0]; //b



echo '<br />';



echo $sub['y'][0]; //c



?>



Pablo from Salto, Uruguay

webmaster at m-bread dot com
07-Jun-2005 09:45


Looking at the function from rickyale at ig dot com dot br below getting URLs from an html file, I think this is slightly better:



function get_urls($string, $strict=true) {



   $types = array("href", "src", "url");

   while(list(,$type) = each($types)) {

       $innerT = $strict?'[a-z0-9:?=&@/._-]+?':'.+?';

       preg_match_all ("|$type\=([\"'`])(".$innerT.")\\1|i", $string, &$matches);

       $ret[$type] = $matches[2];

   }



return $ret;

};



This only gets urls in quotes "...", `...` or '...', but not mixed quotes like `..." (thanks to w w w's note on the 'pattern syntax' page). If you set the second parameter to false, then the function will give you any contents of attribute (so the function can be used to get other attributes, such as alt). To make it more strict, the '[a-z0-9:?=&@/._-]+?' can be replaced with a regular expression for a url.

webmaster at swirldrop dot com
07-Jun-2005 08:40


If you want to get al the text characters from a string, possibly entered by a user, and filter out all the non alpha-numeric characters (perhaps to make an ID to enter user-submitted details into a database record), then you can use the function below. It returns a string of only the alpha-numeric characters from the input string (all in lower case), with all other chracters removed:



<?php

function getText($string){

preg_match_all('/(?:([a-z]+)|.)/i', $string, $matches);

return strtolower(implode('', $matches[1]));

};//EoFn getText

?>



It took me quite a while tocome up with this regular expression. I hope it saves someone else that time.

20-Apr-2005 11:35


A little correction to my function below:



<?php

function urlhighlight($str) {

    preg_match_all("/http:\/\/?[^ ][^<]+/i",$str,$lnk);

    $size = sizeof($lnk[0]);

    $i = 0;

    while ($i < $size) {

        $len = strlen($lnk[0][$i]);

        if($len > 30) {

            $lnk_txt = substr($lnk[0][$i],0,30)."...";

        } else {

            $lnk_txt = $lnk[0][$i];    

        }

        $ahref = $lnk[0][$i];

        $str = str_replace($ahref,"<a href='$ahref' target='_blank'>$lnk_txt</a>",$str);

        $i++;

    }

    return $str;

}

?>



The error is in the preg_match_all("/http:\/\/?[^ ][^<]+/i",$str,$lnk); the [^<] was missing.

Dan Madsen
20-Apr-2005 09:25


I wrote a function, which takes urls from a string, or database output, highlights them, and shortens the links name if its above 30 characters.



Note: You'll have to use nl2br() function on the string before using it, because I didn't know how to check for LineFeed or CarrigeReturn in preg-style.



<?php

function urlhighlight($str) {

    preg_match_all("/http:\/\/?[^ ]+/i",$str,$lnk);

    $size = sizeof($lnk[0]);

    $i = 0;

    while ($i < $size) {

        $len = strlen($lnk[0][$i]);

        if($len > 30) {

            $lnk_txt = substr($lnk[0][$i],0,30)."...";

        } else {

            $lnk_txt = $lnk[0][$i];    

        }

        $ahref = $lnk[0][$i];

        $str = str_replace($ahref,"<a href='$ahref'>$lnk_txt</a>",$str);

        $i++;

    }

    return $str;

}

?>

Ex:

<?php

$str = "a lot of text with urls in it and alot of linebreaks";

$str = urlhighlight(nl2br($str));

?>

b2sing4u at naver dot com
09-Apr-2005 06:42


This function converts all HTML style decimal character code to hexadecimal code.

ex) Hi &#959; &#9674; Dec  ->  Hi &#x03BF; &#x25CA; Dec



function d2h($word) {

  $n = preg_match_all("/&#(\d+?);/", $word, $match, PREG_PATTERN_ORDER);

  for ($j = 0; $j < $n; $j++) {

    $word = str_replace($match[0][$j], sprintf("&#x%04X;", $match[1][$j]), $word);

  }

  return($word);

}



& This function converts all HTML style hexadecimal character code to decimal code.

ex) Hello &#x03BF; &#x25CA; Hex  ->  Hello &#959; &#9674; Hex



function h2d($word) {

  $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $word, $match, PREG_PATTERN_ORDER);

  for ($j = 0; $j < $n; $j++) {

    $word = str_replace($match[0][$j], sprintf("&#%u;", hexdec($match[1][$j])), $word);

  }

  return($word);

}

b2sing4u
07-Apr-2005 05:24


Character Code Conversion Example.



You can use following example to convert character code in HTML file.



First example converts Hexadecimal code to Decimal code.

  ex) Hello &#xFF; Hex -> Hello &#255; Hex



Second example converts Decimal code to Hexadecimal code.

  ex) Hi &#16; Dec -> Hi &#x0010; Dec



<?php



$h2d_get = fopen("h2d_get.htm", 'r');

$h2d_out = fopen("h2d_out.htm", 'w');



for ($i = 1; $i <= 1000; $i++)

{

  if (feof($h2d_get)) { break; }



  $line = fgets($h2d_get, 409600);

  $line = trim($line);

  if ($line == "99999999") { break; }



  $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $line, $match, PREG_PATTERN_ORDER);



  for ($j = 0; $j < $n; $j++)

  {

    $find = $match[0][$j];

    $code = hexdec($match[1][$j]);

    $push = sprintf("&#%u;", $code);

    $line = eregi_replace($find, $push, $line);

  }



  fwrite($h2d_out, $line);

  fwrite($h2d_out, "\r\n");

}



fclose($h2d_get);

fclose($h2d_out);



?>



<?php



$d2h_get = fopen("d2h_get.htm", 'r');

$d2h_out = fopen("d2h_out.htm", 'w');



for ($i = 1; $i <= 1000; $i++)

{

  if (feof($d2h_get)) { break; }



  $line = fgets($d2h_get, 409600);

  $line = trim($line);

  if ($line == "99999999") { break; }



  $n = preg_match_all("/&#(\d+?);/", $line, $match, PREG_PATTERN_ORDER);



  for ($j = 0; $j < $n; $j++)

  {

    $find = $match[0][$j];

    $code = $match[1][$j];

    $push = sprintf("&#x%04X;", $code);

    $line = eregi_replace($find, $push, $line);

  }



  fwrite($d2h_out, $line);

  fwrite($d2h_out, "\r\n");

}



fclose($d2h_get);

fclose($d2h_out);



?>

arias at elleondeoro dot com
15-Feb-2005 08:27


If you want to find all positions and his length, you can use the next function:



<?php

function preg_match_all_positions($pattern, $subject, &$count=null, $flags=0, $offset=0) {

  for ($count=0; preg_match($pattern, $subject, $match, $flags, $offset); $count++) {

    $positions[0][] = $pos = strpos($subject, $match[0], $offset);

    $positions[1][] = $len = strlen($match[0]);

    $offset = $pos+$len;

  }

  return $positions;

}

?>

mpbweb at mbourque dot com
03-Feb-2005 02:41


Here is a handy function I wrote that will check for broken links on the supplied url.



function dead_links($url) {



// mixed link_checker( $url )

// Returns:

//    FALSE if no broken links are found.

//    ARRAY containing broken links if any are found.



   ob_start();

      if( !readfile($url) ) return FALSE;

      $body = ob_get_contents();

   ob_end_clean();



   $pathparts = pathinfo($url);



   $urlpattern = "/<a[^>]+href=\"([^\"]+)/i";

   preg_match_all($urlpattern,$body,$matches);



   foreach( $matches[1] as $link) {



      if( strpos($link,"http://") === FALSE ) { // Deal with relative paths

         $link = $pathparts['dirname'] . "/" . $link;

      }



      $fp = @fopen("$link", "r");

      fclose($fp);

      if (!$fp) {

         $linkArray[] = $link;

      }



   }



   return (is_array($linkArray) ) ? $linkArray : FALSE;

}



Regards,



Michael Bourque

MCLD
20-Jan-2005 06:35


Here's a nice easy use for preg_match_all. I have data files in comma-separated-values format, with all the data enclosed in quote marks. To convert one line of such a data file into an array:



function quotedCsvLineToArray($l)

{

  preg_match_all('/(?<=,|\A)("(.*?)")?(?=,|\Z)/',$l, $matches, PREG_PATTERN_ORDER);

  return $matches[2];

}



hope it helps

dan

hex6ng at yahoo dot com
03-Jul-2004 06:04


This is a much more efficient version of the same function posted in ereg_replace() discussion by hdn, who is the same person as hex6ng.  I didn't include activating urls without http:// protocol identifier because there are many xxx.xxx patterns that are not urls.



function html_activate_urls($str)

{

    // lift all links, images and image maps

    $url_tags = array (

                     "'<a[^>]*>.*?</a>'si",

                     "'<map[^>]*>.*?</map>'si",

                     "'<script[^>]*>.*?</script>'si",

                     "'<style[^>]*>.*?</style>'si",

                     "'<[^>]+>'si"

                      );

    foreach($url_tags as $url_tag)

    {

        preg_match_all($url_tag, $str, $matches, PREG_SET_ORDER);

        foreach($matches as $match)

        {

            $key = "<" . md5($match[0]) . ">";

            $search[] = $key;

            $replace[] = $match[0];

        }

    }



    $str = str_replace($replace, $search, $str);



    // indicate where urls end if they have these trailing special chars

    $sentinals = array("/&(quot|#34);/i",        // Replace html entities

                       "/&(lt|#60);/i",

                       "/&(gt|#62);/i",

                       "/&(nbsp|#160);/i",

                       "/&(iexcl|#161);/i",

                       "/&(cent|#162);/i",

                       "/&(pound|#163);/i",

                       "/&(copy|#169);/i");



    $str = preg_replace($sentinals, "<marker>\\0", $str);



    // URL into links

    $str = 

preg_replace( "|\w{3,10}://[\w\.\-_]+(:\d+)?[^\s\"\'<>\(\)\{\}]*|",  

                   "<a href=\"\\0\">\\0</a>", $str ); 



    $str = str_replace("<marker>", '', $str);

    return str_replace($search, $replace, $str);

}



-hdn

vb_user at yahoo dot com
22-Apr-2004 12:00


If you want to extract the list of php functions in one of your library (ie, includes) for documentation or any purpose use the below:



$filename = 'library.php';

$fp = fopen($filename,'r');

if ($fp !== false) {

    $str = fread($fp, filesize ($filename));

    $count = preg_match_all ("|function[ ]+(.*)[\(](.*)[\)]|U", $str, $out, PREG_PATTERN_ORDER);



    for ($i=0; $i<$count; $i++) {

        if (!eregi('array',$out[1][$i])) {

            echo '#T='.$out[1][$i]."\n";

            echo $out[1][$i].'('.$out[2][$i].')'."\n\n";

        }

    }

}

fabriceb at gmx dot net
05-Mar-2004 10:55


If you just want to find out how many times a string contains another simple string, don't use preg_match_all like I did before I fould the substr_count function.



Use

<?php

$nrMatches = substr_count ('foobarbar', 'bar');

?>

instead. Hope this helps some other people like me who like to think too complicated :-)

add a note

preg_match_all

说明