1

I need a Php-RegEx to find all double spaces between a start-keyword and end-keyword and remove them.

$teststring = 'This is a teststring ... :keyword_start: this is    the content    with double spaces :keyword_end: more text ... :keyword_start: this is the second   content    with double spaces :keyword_end: ... more text';

I need the follow result:

This is a teststring ... :keyword_start: this is the content with double spaces :keyword_end: more text ... :keyword_start: this is the second content with double spaces :keyword_end: ... more text

This is what I've tried: (But it does not work)

$teststring = preg_replace('#(:keyword_start:)\s\s+(:keyword_end:)#si', '', $teststring);

Can anyone help me ?

Integer
  • 25
  • 3
  • Try this one [http://stackoverflow.com/questions/2368539/php-replacing-multiple-spaces-with-a-single-space](http://stackoverflow.com/questions/2368539/php-replacing-multiple-spaces-with-a-single-space) – Tim007 Feb 27 '16 at 14:50

4 Answers4

2

You can do it with this kind of pattern using the \G anchor. This anchor matches the position after the previous match (and the start of the string by default). With it you can obtain contiguous matches (until you break the contiguity):

$pattern = '~(?:\G(?!\A)|:keyword_start:\s)(?:(?!:keyword_end:)\S+\s)*+\K\s+~S';

$result = preg_replace($pattern, '', $str);

pattern details:

~             # pattern delimiter
(?:           # non-capturing group
    \G(?!\A)             # contiguous branch (not at the start of the string)
  |                      # OR
    :keyword_start:\s    # start branch
)
(?:
    (?!:keyword_end:)\S+ # all non-blank characters that are not the "end word"
    \s                   # a single space
)*+                   # repeat the group until a double space or the "end word"
\K                    # remove all on the left from the match result
\s+                   # spaces to remove
~S      # "STUDY" modifier to improve non anchored patterns

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
1

You can use a callback on what's in between the words.

$str = preg_replace_callback('/:keyword_start:(.*?):keyword_end:/s', function ($m) {
  return ':keyword_start:' . preg_replace('/\s{2,}/', " ", $m[1]) . ':keyword_end:';
}, $str);
  • (.*?) between the tokens captures lazily any amount of any characters to $1
  • \s{2,} matches two or more whitespaces
  • s flag after closing delimiter makes the dot match newlines

See demo at eval.in


It could be done with one nifty regex, but more prone to fail & explaining takes longer. Something like

/(?::keyword_start:|\G(?!^)\S+)\K(?<!_end:)\s+/

Demo at regex101

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

Well I am no good in php, hence I will give a solution irrespective of language. This will be helpful as you can choose your language and implement it likewise.

So the solution. Well there isn't an easy way to find double space between two keywords. There might be some elite regex. But my approach is pretty straightforward.

Step 1: Find the text between keywords, achieved using (?<=:keyword_start:).*?(?=:keyword_end:).

Regex101 Demo here.

Step 2: Replace the double spaces or multiple tabs in found text using simple \s+.

Regex101 Demo here.

-1

If you want regex to replace all whitespaces, including tabs and empty lines, you can use this:

$s = preg_replace('/\s+/', ' ', $s);

It will replace TAB and newline even if it is only one, between characters. Multiple (any) whitespaces will reduced to one space character too.

Regex for only multiple spaces is here (but in that case is faster to use str_replace like in another answer here)

$s = preg_replace('/  */', ' ', $s);
micropro.cz
  • 598
  • 4
  • 17