Regex to replace all occurrences of single character within specific tokens

Question

I would like to know if a single set of regex search/replace patterns could be used to replace all occurrences of a specific character inside a string contained within 2 tokens.

For example, is it possible to replace all periods with spaces for the text between TOKEN1 & TOKEN2 as in the example below?

So that:

TOKEN1:Run.Spot.run:TOKEN2

is changed to:

TOKEN1:Run Spot run:TOKEN2

NOTE: The regular expression would need to be capable of replacing any number of periods within any text, and not just the specific pattern above.

I ask this question more for my personal knowledge, as it is something I have wanted to do quite a few times in the past with various regex implementations. In this particular case, however, the regex would be in php.

I am not interested in php workarounds as I know how to do that. I am trying to expand my knowledge of regex.

Thanks

First [this](http://stackoverflow.com/a/1732454/2071828). Next, if you do not have nested tags or any funny stuff this should be fairly easy to do with back references. — Boris the Spider, Sep 02 '13 at 22:04
Thanks. It may be easy for you but I have not had any luck figuring it out. ;) I am not parsing HTML. I just made up a poorly chosen example. I have edited the question to be more generic. There would be nothing of concern within the tokens, such as nesting, etc. Just simple characters, numbers, periods, and maybe a hyphen. I know how to replace the periods without the requirement of the wrapping tokens. But I have not been able to figure out anything that works only on text within the tokens. — Max Pfleger, Sep 02 '13 at 22:20
There's no such thing as a "regex statement", or a regular expression that's "capable of replacing" something. Regexes are a notation for searching and matching. String-replacement mechanisms often *use* regexes, but there is a lot of diversity in how they work. — ruakh, Sep 02 '13 at 22:43

Casimir et Hippolyte · Accepted Answer · 2021-11-13T23:38:27.237

4

A way to do this:

$pattern = '~(?:TOKEN1:|\G(?!^))(?:[^:.]+|:(?!TOKEN2))*\K\.~';
$replacement = ' ';
$subject = 'TOKEN1:Run.Spot.run:TOKEN2';
$result = preg_replace($pattern, $replacement, $subject);

pattern details:

~                  # pattern delimiter
(?:                # open a non capturing group
    TOKEN1:        # TOKEN1:
  |                # OR
    \G(?!^)        # a contiguous match but not at the start of the string
)                  # close the non capturing group
(?:                # open a non capturing group
    [^:.]+         # all that is not the first character of :TOKEN2 or the searched character
  |                # OR
    :(?!TOKEN2)    # The first character of :TOKEN2 not followed by the other characters
)*                 # repeat the non capturing group zero or more times
\K                 # reset the match
\.                 # the searched character
~                  # delimiter

The idea is to use \G to force each match to be TOKEN1: or a match contiguous with the precedent match.

Notice: the default behavior is like an html tag (it is always open until it is closed). If :TOKEN2 is not found all the \. characters will be replaced after TOKEN1:.

edited Nov 13 '21 at 23:38

answered Sep 02 '13 at 22:22

Casimir et Hippolyte

88,009
5
94
125

Are you sure this would work, I tried it with `TOKEN1:Run.Spot.run:HELLO` and it replaced the dots anyway. – Ibrahim Najjar Sep 02 '13 at 22:55
@Sniffer: Yes, I know, the default behaviour is like an html tag (it is always open until it is closed) – Casimir et Hippolyte Sep 02 '13 at 22:58
You are correct, I find the usage of `\G` here as great idea. If you matched before then start from were you left so definitely we have matched `TOKEN1` first. Bottom Line: Great Idea +1 for that. – Ibrahim Najjar Sep 02 '13 at 23:01
Beautiful answer, Casimir! Thank you very much! – Max Pfleger Sep 02 '13 at 23:09

score 0 · Answer 2 · answered Sep 02 '13 at 22:49

I think the best way is to write something like this:

$result =
    preg_replace_callback(
        '/(TOKEN1:)([^:]+)(:TOKEN2)/g',
        function ($matches) {
            return $matches[0]
                   . preg_replace('/[.]/g', ' ', $matches[1])
                   . $matches[2];
        },
        'TOKEN1:Run.Spot.run:TOKEN2'
    );

(Disclaimer: not tested.)

score 0 · Answer 3 · answered Sep 02 '13 at 22:53

At it's simplest, you would need an escaped (\) period (since period usually matches any character) as your pattern :\., and you would replace it with a space: .

This will replace all instances of . with .

However, from your comment, you appear to be asking for a regex to replace all periods between word characters:

(?<=\w)\.(?=\w)

You would need a positive (zero-width noncapturing) lookbehind for a word character: (?<=\w), your escaped period (\.) and a positive (zero-width noncapturing) lookahead for a word character: (?=\w). Replacing this with a space would have the result you want.

If you want to replace periods only between tokens, you could prepend a positive lookbehind: (?<=TOKEN1:.+) and append a positive lookahead: (?=.+TOKEN2), so the complete regex would be:

(?<=TOKEN1:.+)(?<=\w)\.(?=\w)(?=.+TOKEN2)

You may need to refine this if a period can occur immediately after the opening token and/or immediately before the closing token and you don't want to replace them.

PHP doesn't support variable width lookbehind or this would be simple. — Ibrahim Najjar, Sep 02 '13 at 22:56
There wasn't a PHP tag on the question when I answered this. Too many regex implementations have limitations with lookbehinds. — Monty Wild, Sep 02 '13 at 23:10

Regex to replace all occurrences of single character within specific tokens

3 Answers3

Linked