6

Remove all consecutive repeated characters using Regular Expression.

In Javascript this works well:

txt='aaa bbb 888 bbb ccc ddd'.replace(/(?!(?!(.)\1))./g,'');

Returns 'a b 8 b c d'

How can I do it with Posgresql regexp_replace function? This won't work:

SELECT regexp_replace('aaa bbb 888 bbb ccc ddd',E'(?!(?!(.)\\\\1)).','g');

$ psql -c "SELECT regexp_replace('aaa bbb 888 bbb ccc ddd',E'(?!(?!(.)\\1)).','g');"
     regexp_replace      
-------------------------
 aaa bbb 888 bbb ccc ddd
(1 row)

$ psql -c "SELECT regexp_replace('aaa bbb 888 bbb ccc ddd','(?!(?!(.)\1)).','g');"   
ERROR:  invalid regular expression: invalid backreference number

What am I doing wrong?

Cœur
  • 37,241
  • 25
  • 195
  • 267
IanS
  • 61
  • 3
  • 1
    `select string_agg(x[1],'') from regexp_matches('aaa bbb 888 111122222 1234 bbb ccc ddd','(.)\1*','g') t(x);` – Abelisto Aug 16 '16 at 07:46
  • @Abelisto This returns `a b 8 12 1234 b c d`, which isn't exactly what the OP wanted, but I upvoted your comment because it is close. – Tim Biegeleisen Aug 16 '16 at 07:49
  • 1
    @TimBiegeleisen It is because different input to test some more use cases. – Abelisto Aug 16 '16 at 07:52
  • The OP wants only the first character of every word (contain it numbers or letters), at least that is how I read it. – Tim Biegeleisen Aug 16 '16 at 07:55
  • 1
    @TimBiegeleisen To get the first character of every word (not just delete repeated characters): `select string_agg(x,'') from unnest(array(select regexp_matches('aaa bbb 888 111122222 1234, bbb ccc ddd','\m(\w)\w*(\W*)','g'))) t(x);` – Abelisto Aug 16 '16 at 08:23
  • @Abelisto I just tested this and it didn't work. – Tim Biegeleisen Aug 16 '16 at 08:23
  • @TimBiegeleisen I am confused. Just after copy/paste to the psql it does not work. But after delete and reenter `(\W*)‌` part it works fine. – Abelisto Aug 16 '16 at 08:36
  • 1
    @Abelisto - You should put this down as an answer. Your first comment perfectly solves the problem. – Addison Nov 28 '16 at 05:35

1 Answers1

5

There's a similar SO question that can help you to get the answer:

SELECT regexp_replace('aaa bbb 888 bbb ccc ddd', '(.)\1{1,}', '\1', 'g');
 regexp_replace 
----------------
 a b 8 b c d
(1 row)

It uses a backreference to capture the groups of repeating characters.

Community
  • 1
  • 1
mgamba
  • 1,189
  • 8
  • 10