2

I have a string like this.

$dot_prod = "at the coast will reach the Douglas County coast";

I'd like this result by using a regex: at the coast will reach the Douglas County coast

Specifically, I want to bold the word "coast" and "the" but only the word coast if not preceded by the word "county" and only the word "the" if not preceded by the word "at". So, essentially I want an array of words or phrases (case-insensitive that keeps the case the word/phrase was originally in) to be bolded and then an array of words or phrases that I want to ensure are not bolded. For instance, the array of words/phrases that I want bolded are:

$bold = array("coast", "the", "pass");

and the array of words I want to ensure are unbolded are:

$unbold = array("county coast", "at the", "grants pass");

I'm able to do the bolding with this:

$bold = array("coast", "the", "pass");

$dot_prod = preg_replace("/(" . implode("|", $bold) . ")/i", "<b>$1</b>", $dot_prod);

However, I've been unsuccessful at unbolding afterwards, and I definitely couldn't figure out how to do it all in one expression. Can you offer any help please? Thank you.

user1610717
  • 471
  • 5
  • 16

1 Answers1

3

You may match and skip the patterns you want to "unbold" and match those you want to bold in any other context.

Build a regex like this (I added word boundaries to match whole words, you do not have to use them probably, but that seems a good idea for your current input):

'~\b(?:county coast|at the|grants pass)\b(*SKIP)(*F)|\b(?:coast|the|pass)\b~i'

See the regex demo.

Details

  • \b - word boundary
  • (?:county coast|at the|grants pass) - any of the alternatives
  • \b - a word boundary
  • (*SKIP)(*F) - PCRE verbs to skip the current match and proceed looking for a match from the end of the current match
  • | - or
  • \b - a word boundary
  • (?:coast|the|pass) - any of the alternatives
  • \b - a word boundary.

The $0 in the replacement is the reference to the whole match value.

PHP demo:

$dot_prod = "at the coast will reach the Douglas County coast";
$bold = array("coast", "the", "pass");
$unbold = array("county coast", "at the", "grants pass");
$rx = "~\b(?:" . implode("|", $unbold) . ")\b(*SKIP)(*F)|\b(?:" . implode("|", $bold) . ")\b~i";
echo preg_replace($rx, "<b>$0</b>", $dot_prod);
// => at the <b>coast</b> will reach <b>the</b> Douglas County coast

One caveat: since your search terms can include whitespace, it is a good idea to sort the $bold and $unbold array by length in the descending order before building the pattern:

usort($unbold, function($a, $b) { return strlen($b) - strlen($a); });
usort($bold, function($a, $b) { return strlen($b) - strlen($a); });

See another PHP demo.

In case these items can contain special regex metachars, also use preg_quote on them.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    What must one do to become a regex master of your level? – frobinsonj Oct 30 '18 at 14:21
  • @Profit I have been here on SO, regex tag, for 4 years, every day. – Wiktor Stribiżew Oct 30 '18 at 14:27
  • @WiktorStribiżew, thanks so much! This works great. If you have time, can you explain a little more about the order of the regex? Does it work so that if it matches one of the first phrases (first group), then it ignores the next group? Is that what SKIP F does? Just trying to grasp a bit better. Thanks again!! – user1610717 Oct 30 '18 at 14:51
  • @user1610717 You may read [here](https://stackoverflow.com/questions/35606426/order-of-regular-expression-operator/35606463#35606463) about the importance of the alternative order in regex. If your search terms consisted of word chars only, mere `\b` word boundaries would do. Whitespaces invalidate the `\b` word boundary, so you should be cautious here. As for `SKIP-FAIL` technique, it is [quite well-known](https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex). – Wiktor Stribiżew Oct 30 '18 at 14:56