PHP and PCRE: Greedy or Not?
After a while I once again came in touch with Regular
Expressions.
I needed to convert some BBCode into HTML but didn’t want to use a
prepacked library mostly because they’re just too bloated for my
purpose. However, I struggled over a few problems, first of all it took
me some time to realize that .
(the dot) doesn’t match new line
characters (n
). Maybe I should have read the documentation before
starting to code.
This can be solved by the
Patternmodifier
/s
. The second problem was that my regular expression matched too
often against things witch it shouldn’t. For example, I had something
like this:
$text = '[url=http://sf.net]sourceforge[/url] and
[url=http://freshmeat.net]freshmeat[/url]';
$text = preg_replace(
'/[url=(.*)](.*)[/url]/is',
'<a href="1">2</a>',
$text
);
which results in the following
<a href="http://sf.net]sourceforge[/url] and
[url=http://freshmeat.net">freshmeat</a>
so what happens? The regular expression matches the first and last url
tags and not the corresponding ones. After a while I figured out that
this can be solved by the following:
$text = preg_replace(
'/[url=(.*?)](.*?)[/url]/is',
'<a href="1">2</a>',
$text
);
The question mark changes the quantifiers to be non-greedy. From the PHP-Manual:
However, if a quantifier is followed by a question mark, then it ceases
to be greedy, and instead matches the minimum number of times possible.