I have data in a table of length 9 where data is like
999999969
000000089
666666689
I want to delete only those data in which any number from 1-9 is repeating more than 5 times.
I have data in a table of length 9 where data is like
999999969
000000089
666666689
I want to delete only those data in which any number from 1-9 is repeating more than 5 times.
OK, so the logic here can be summed up as:
Right?
So, lets split it into series of consecutive digits:
regress=> SELECT regexp_matches('666666689', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g');
regexp_matches
----------------
{6666666}
{8}
{9}
(3 rows)
then filter for the longest:
regress=>
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
ORDER BY length(x[1]) DESC
LIMIT 1;
x
---------
6666666
(1 row)
... but really, we don't actually care about that, just if any entry is longer than 5 digits, so:
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5;
can be used as an EXISTS
test, e.g.
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5
)
which is actually pretty efficient and return the correct result (always nice). But it can be simplified a little more with:
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0{6}|1{6}|2{6}|3{6}|4{6}|5{6}|6{6}|7{6}|8{6}|9{6})', 'g') x;
)
You can use the same WHERE
clause in a DELETE
.
Horrible and terrible in terms of performance, but it should work:
DELETE FROM YOURTABLE
WHERE YOURDATA LIKE '%111111%'
OR YOURDATA LIKE '%222222%'
OR YOURDATA LIKE '%333333%'
OR YOURDATA LIKE '%444444%'
OR YOURDATA LIKE '%555555%'
OR YOURDATA LIKE '%666666%'
OR YOURDATA LIKE '%777777%'
OR YOURDATA LIKE '%888888%'
OR YOURDATA LIKE '%999999%'
This can be much simpler with a regular expression using a back reference.
DELETE FROM tbl
WHERE col ~ '([1-9])\1{5}';
That's all.
([1-9])
... a character class with digits from 1 to 9, parenthesized for the following back reference.
\1
... back reference to first (and only in this case) parenthesized subexpression.
{5}
.. exactly (another) 5 times, making it "more than 5".
A back reference (\n) matches the same string matched by the previous parenthesized subexpression specified by the number
n
[...] For example,([bc])\1
matchesbb
orcc
but notbc
orcb
.