PHP and PCRE: Greedy or Not?

18.07.2005 at 12:36

After a while I once again came in touch with Regular Expressions. I needed to convert some BBCode into HTML but didn't want to use a prepacked library mostly because they're just too bloated for my purpose. However, I struggled over a few problems, first of all it took me some time to realize that . (the dot) doesn't match new line characters (n). Maybe I should have read the documentation before starting to code. This can be solved by the Patternmodifier /s. The second problem was that my regular expression matched too often against things witch it shouldn't. For example, I had something like this:

$text = '[url=http://sf.net]sourceforge[/url] and
         [url=http://freshmeat.net]freshmeat[/url]';

$text = preg_replace(
    '/[url=(.*)](.*)[/url]/is',
    '<a href="1">2</a>',
    $text
);

which results in the following

<a href="http://sf.net]sourceforge[/url] and 
   [url=http://freshmeat.net">freshmeat</a>

so what happens? The regular expression matches the first and last url tags and not the corresponding ones. After a while I figured out that this can be solved by the following:

$text = preg_replace(
    '/[url=(.*?)](.*?)[/url]/is',
    '<a href="1">2</a>',
    $text
);

The question mark changes the quantifiers to be non-greedy. From the PHP-Manual:

However, if a quantifier is followed by a question mark, then it ceases 
to be greedy, and instead matches the minimum number of times possible.

Comments (0)

There are currently no comments available