1

I have data in a table of length 9 where data is like

999999969
000000089
666666689

I want to delete only those data in which any number from 1-9 is repeating more than 5 times.

code_rum
  • 872
  • 6
  • 21

3 Answers3

3

OK, so the logic here can be summed up as:

  • Find the longest series of the same consecutive digit in any given number; and
  • Return true if that longest value is > 5 digits

Right?

So, lets split it into series of consecutive digits:

regress=> SELECT regexp_matches('666666689', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g');
 regexp_matches 
----------------
 {6666666}
 {8}
 {9}
(3 rows)

then filter for the longest:

regress=> 

SELECT x[1] 
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x 
ORDER BY length(x[1]) DESC 
LIMIT 1;

    x    
---------
 6666666
(1 row)

... but really, we don't actually care about that, just if any entry is longer than 5 digits, so:

SELECT x[1] 
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x 
WHERE length(x[1]) > 5;

can be used as an EXISTS test, e.g.

WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
    SELECT x[1] 
    FROM regexp_matches(n, '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x 
    WHERE length(x[1]) > 5
)

which is actually pretty efficient and return the correct result (always nice). But it can be simplified a little more with:

WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
    SELECT x[1] 
    FROM regexp_matches(n, '(0{6}|1{6}|2{6}|3{6}|4{6}|5{6}|6{6}|7{6}|8{6}|9{6})', 'g') x;
)

You can use the same WHERE clause in a DELETE.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
2

Horrible and terrible in terms of performance, but it should work:

DELETE FROM YOURTABLE 
 WHERE YOURDATA LIKE '%111111%' 
    OR YOURDATA LIKE '%222222%' 
    OR YOURDATA LIKE '%333333%'
    OR YOURDATA LIKE '%444444%'
    OR YOURDATA LIKE '%555555%'
    OR YOURDATA LIKE '%666666%'
    OR YOURDATA LIKE '%777777%'
    OR YOURDATA LIKE '%888888%'
    OR YOURDATA LIKE '%999999%'
  • 1
    Actually, performance might not be as bad as it looks, because `LIKE` is still fast when combined with a [trigram index](http://stackoverflow.com/questions/1566717/postgresql-like-query-performance-variations/13452528#13452528). But with that many alternatives, a single regular expression is probably faster, and a lot shorter, of course. – Erwin Brandstetter Jun 24 '14 at 12:12
2

This can be much simpler with a regular expression using a back reference.

DELETE FROM tbl
WHERE  col ~ '([1-9])\1{5}';

That's all.

Explain

([1-9]) ... a character class with digits from 1 to 9, parenthesized for the following back reference.
\1 ... back reference to first (and only in this case) parenthesized subexpression.
{5} .. exactly (another) 5 times, making it "more than 5".

Per documentation:

A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number n [...] For example, ([bc])\1 matches bb or cc but not bc or cb.

SQL Fiddle demo.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228