-1
SELECT regexp_replace(' aaa bbb 888 bbb ccc ddd fff YESS', '(.)\1+', '\1', 'g');

Return: a b 8 b c d f YES
Expect: a b 8 b c d f YESS

The pattern : if there are 3 repeated characters (before and after both have more than 0 white spaces), only extract one character.

Related Question: Remove Consecutive Repeated Characters with PostgreSQL regexp_replace function

jian
  • 4,119
  • 1
  • 17
  • 32

1 Answers1

2

Try this pattern:

SELECT regexp_replace(' aaa bbb 888 bbb ccc ddd fff YESS', '\m(.)\1+\M', '\1', 'g');
   regexp_replace    
═════════════════════
  a b 8 b c d f YESS
(1 row)

\m matches at a word beginning and \M at a word end.

Note that that will match repetitions like aaa.bbb too, because the period is a word beginning and end. If you want to only match repetitions betwee spaces, as stated in your question, you can use lookahead and lookbehind matches:

SELECT regexp_replace('aaa bbb 888 bbb ccc.ddd fff YESS', '(?<= )(.)\1+(?= )', '\1', 'g');
      regexp_replace      
══════════════════════════
 aaa b 8 b ccc.ddd f YESS
(1 row)
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • This doesn't work when input starts with `"aaa"` (no initial space) - there's still a word boundary between start of input and `aaa` but there's no preceding space, and spaces are required both sides to be reduced. It also doesn't work for non-space word boundaries, eg `"aaa bbb.ccc"` should remain unchanged (there are no space-surrounded repeated chars) but this will produce `"a b c"`. – Bohemian Sep 20 '21 at 08:19
  • @Bohemian That is not true, did you test it? A `\m` matches the beginning of the word even at the beginning of the string. It also works for word boundaries other than spaces. – Laurenz Albe Sep 20 '21 at 11:56
  • I don’t need to test it. You have just described the problem perfectly. OP’s requirements are *before and after both have more than 0 white spaces*, but when there is no space at the start of the input your regex still matches due to the word boundary matching between start of input and a letter, which means it matches when the word does **not** have spaces before and after. – Bohemian Sep 20 '21 at 12:38
  • @Bohemian Ah, I see. You pointed out that the question specifically asked for space. True. I have added an alternative that sticks closer to the text of the question. – Laurenz Albe Sep 20 '21 at 12:49