0

Trying to match string that starts with #1-9 note: # is followed by a number from 1 to 9 and ends with #1-9 (or not).

Full string : "#1Lorem Ipsum is simply dummy text#2printing and typesetting industry"

Idea:

is to replace #1Lorem Ipsum is simply dummy text with <span class="one">Lorem Ipsum is simply dummy text</span>

and #2printing and typesetting industry with <span class="two">printing and typesetting industry</span>

so to replace #1-9 with <span class="number"> and append the ending tag </span> at the end of each.

but:

let's say if the string has only one string starting with #1-9 like that :

"#1Lorem Ipsum is simply dummy text" how could be putting </span> at the end to close the <span> tag.

i'm guessing maybe using the last " at the end of words to prepend the closing </span> tag before it, since no more #1-9 to stop before it, but without losing or replacing the last " of the string.

so it becomes: "<span class="one">Lorem Ipsum is simply dummy text</span>"

Regex i've tried : (#[0-9])(.*?)(#|") but this is only matching the first part #1 of the string and ignoring the #2 part (see full string).

I will be using php to match and replace maybe using preg_replace just need to find a way to the regex part first.

How can i achieve this?

Amr SubZero
  • 1,196
  • 5
  • 19
  • 30

3 Answers3

3

What you are looking for is a negative look-ahead. It's very powerful and will only match if the match inside does not match.

#([0-9])((?:(?!$|#[0-9]).)+)

This will look for #0-9 and end if another #0-9 occurs, or end of line. The negative look-ahead bit is this: (?!$|#[0-9]). It says only continue if it cannot match $ or #0-9. You have to process it for every character, so when you don't match it, match the next character with ., and match it all in a capture group.

Here's the railroad diagram:

enter image description here

Which was generated using regexper.com

Kyle
  • 3,935
  • 2
  • 30
  • 44
  • Good one sir, but it's exceeding the double quote at the end, needed it to stop before that double quote if that is possible. so i can have matches of ```#1-9``` with following text only, that will help me do replacements i mentioned. – Amr SubZero Aug 02 '22 at 22:16
  • @AmrSubZero I thought the "" were just for formatting. Will the string always contain "# at the beginning and end with a "?. Could you just remove the quotes? – Kyle Aug 02 '22 at 22:20
  • The string will always be inside double quotes "#1test", so i will have matches with only ```#1-9``` with following text, then replace all those with only following text inside ```span``` tag like: ```test``` – Amr SubZero Aug 02 '22 at 22:24
  • @AmrSubZero Then your best bet is to just remove the quotes prior to running preg_replace. You can get preg_match to properly match the pattern with quotes, but I don't know of an easy way to get preg_replace to see and replace the beginning and end quotes and also iterate all the instances of #0-9. So, yeah, just remove the quotes. – Kyle Aug 02 '22 at 22:34
  • 1
    yep, removing quotes is a perfect trick to also extract what is needed to be replaced, clever! thanks alot for your help! – Amr SubZero Aug 02 '22 at 22:48
2

preg_replace_callback() is the right tool for this job. To avoid needing to manually declare a number mapping array, you can use the NumberFormatter class. Using sprintf() in the callback body will help to separate data from the html and make maintenance easier.

Code: (Demo)

$string = '#1Lorem Ipsum is simply dummy text#2printing and typesetting industry#0nothing#35That\'s a big one!';

echo preg_replace_callback(
         '/#(\d+)((?:(?!#\d).)+)/',
         fn($m) => sprintf(
             '<span class="%s">%s</span>',
             (new NumberFormatter("en", NumberFormatter::SPELLOUT))->format($m[1]),
             htmlentities($m[2])
         ),
         $string
     );

Output:

<span class="one">Lorem Ipsum is simply dummy text</span><span class="two">printing and typesetting industry</span><span class="zero">nothing</span><span class="thirty-five">That&#039;s a big one!</span>

Note that if your actual strings after the #[number] NEVER have # symbols in it you can DRAMATICALLY improve the regex performance by using a greedy negated character class as the second capture group. #(\d+)([^#]+) This reduces the step count from 283 steps to just 16 steps on your sample string.

To be perfectly honest, even a lazy pattern like #(\d+)(.+?(?=#\d|$)) will process the sample string in 213 steps. Performance might not be a factor, so use whatever regex you are most comfortable reading.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
1
<?php
function convert($str) {
    static $numberNamesMap = [
        1 => 'one',
        2 => 'two',
        3 => 'three',
        4 => 'four',
        5 => 'five',
        6 => 'six',
        7 => 'seven',
        8 => 'eight',
        9 => 'nine',
    ];
    return preg_replace_callback(
        '~#([1-9])(((?!#[1-9]).)*)~',
        function($matches) use ($numberNamesMap) {
            $class = $numberNamesMap[$matches[1]];
            $htmlText = htmlentities($matches[2]);
            return "<span class=\"$class\">$htmlText</span>";
        },
        $str
    ); 
}

References

Examples

echo convert('#1Lorem Ipsum is simply dummy text');

outputs:

<span class="one">Lorem Ipsum is simply dummy text</span>
echo convert('#1Lorem Ipsum is simply dummy text#2printing and typesetting industry');

outputs:

<span class="one">Lorem Ipsum is simply dummy text</span><span class="two">printing and typesetting industry</span>
echo convert('#1Lorem Ipsum is simply dummy text#0printing and typesetting industry');

outputs:

<span class="one">Lorem Ipsum is simply dummy text#0printing and typesetting industry</span>
Pedro Amaral Couto
  • 2,056
  • 1
  • 13
  • 15