1

I found this regex on another post in SO

\b(\(*[CDEFGAB](?:b|bb)*(?:\+|#|##|dim|sus|maj|min|aug|m|M|°|[0-9])*[\(]?[\d\/]*[\)]?(?:[CDEFGAB](?:b|bb)*(?:#|##|dim|sus|maj|min|aug|m|M|°|[0-9])*[\d\/]*)*\)*)\b

this regex works great when I use it with preg_replace, however when it's found the A# B# C# D# E# F# G# (not followed by m like G#m, in this case G#m is works just fine).

for example the following lyric will be highlighted F and G#m when I use the regex in preg_replace($regex, '<span class="_c">$1</span>', $lyrics)

F#
Angel eyes,
G#m
Why won’t you let me apologise?

chord highlighted

I am stucked, and not so good at regex. any help would be appreciated. thanks in advance

Dariel Pratama
  • 1,607
  • 3
  • 18
  • 49

1 Answers1

0

You can use adaptive dynamic word boundaries:

$lyrics = "F#\nAngel eyes,\nG#m\nWhy won’t you let me apologise?";
$regex = '/(?:\b(?=\w)|\B(?!\w))\(*\b[CDEFGAB](?:b|bb)*(?:\+|\#{1,2}|dim|sus|maj|min|aug|[mM°0-9])*\(?[\d\/]*\)?(?:[CDEFGAB](?:b|bb)*(?:\#{1,2}|dim|sus|maj|min|aug|[mM°0-9])*[\d\/]*)*\)*(?:\b(?<=\w)|\B(?<!\w))/';
echo preg_replace($regex, '<span class="_c">$1</span>', $lyrics)

See the regex demo.

The reason is that you matches can start or end with non-word chars, and \b word boundaries behave differently depending on their context.

Here, I suggest using adaptive dynamic word boundaries of Type 2:

  • (?:\b(?=\w)|\B(?!\w)) - a left-hand boundary, making sure the current position is at the word boundary if the next char is a word char (\b(?=\w)), or at a non-word boundary position if the next char is not a word char
  • (?:\b(?<=\w)|\B(?<!\w)) - a right-hand boundary, making sure the current position is at the word boundary if the previous char is a word char, or at a non-word boundary position if the previous char is not a word char.

You may also watch this YT video of mine with more explanations of adaptive dynamic word boundaries and a Python demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563