0

After the useful answers on my previous question (see How do I create a regex to avoid a repeated number with optional hyphen?) we reached a solution that matched my needings. The final result was: ^(?!(\d)(?:-?\1)*$)\d{2}-?\d{7}$ The above regex excludes these data:

  • 00-0000000 and 000000000
  • 11-1111111 and 111111111
  • 22-2222222 and 222222222
  • ...
  • 99-9999999 and 999999999

Note that 22-2222221 is valid.

Note also that the position of the hyphen can be anywhere after the first digit and before the last one

Now that everything seemd to work fine we noticed that this pattern is not compatible with the oracle database REGEXP LIKE command.

Any suggestion on how to adapt it? Thanks in advance.

I read here Oracle regular expression replacement for negative lookahead/lookbehind and the solution provided doesn't seem to work for me.

despaolo
  • 23
  • 4
  • You seem to have a limited set of data to be excluded, have you considered `not in('00-0000000', '000000000', '11-1111111', '111111111', etc... )`? It would probably perform faster than a regex to boot. – MonkeyZeus Jan 12 '23 at 16:43
  • @MonkeyZeus I cannot do it since the position of the hyphen can be anywhere after the first digit and before the last one. – despaolo Jan 12 '23 at 16:45
  • Does it have to be a single regular expression? For example (not necessarily a great method), could you do something like `NOT regexp_like(val, '^[0-9][0-9-]{7,8}[0-9]$') OR length(val) - length(replace(val, '-', '')) > 1 OR COALESCE(length(replace(replace(val, '-', ''), substr(val,1,1), '')), 0) != 0`? – EdmCoff Jan 12 '23 at 17:43
  • You could try matching all the same digits with optional hyphens on between to avoid matching that, and capture in group 2 what you want to keep `^(\d)-?\1-?\1-?\1-?\1-?\1-?\1-?\1-?\1$|^(\d{2}-?\d{7})$` https://regex101.com/r/zZj9PA/1 – The fourth bird Jan 12 '23 at 17:49
  • @Thefourthbird The hyphen has not a fixed position, it could appear in position 2 or 3 or 4 and so on or it could never appear at all – despaolo Jan 13 '23 at 09:02

1 Answers1

0

Given you comment:

the hyphen can be anywhere after the first digit and before the last one

You can do it all without regular expressions using:

SELECT *
FROM   table_name
WHERE  -- Check that the value as the correct length
       LENGTH(value) IN (9, 10)
       -- Check that the value has the correct length without hyphens
AND    LENGTH(REPLACE(value, '-')) = 9
       -- Check that the value has only digits or hyphens
AND    TRANSLATE(value, 'a-0123456789', 'a') IS NULL
       -- Check that all the characters are not either hyphens or the same as the
       -- first character
AND    TRANSLATE(value, 'a-' || SUBSTR(value, 1, 1), 'a') IS NOT NULL;

If the hyphen will always be the 3rd character (if it is present) then:

SELECT *
FROM   table_name
WHERE  -- Check that the value has the correct format
       ( value LIKE '_________' OR value LIKE '__-_______' )
       -- Check that the other characters are digits
AND    TRANSLATE(
          SUBSTR(value, 1, 2) || SUBSTR(value, -7),
          'a0123456789',
          'a'
        ) IS NULL
       -- Check that all the characters are not either hyphens or the same as the
       -- first character
AND    TRANSLATE(value, 'a-' || SUBSTR(value, 1, 1), 'a') IS NOT NULL;

If you want to use regular expressions then you will need two regular expressions:

SELECT *
FROM   table_name
WHERE  REGEXP_LIKE(value, '^\d{2}-?\d{7}$')
AND    NOT REGEXP_LIKE(value, '^(\d)\1-?\1{7}$');

or for hyphens anywhere:

SELECT *
FROM   table_name
WHERE  REGEXP_LIKE(value, '^\d+-?\d+$')
AND    REGEXP_LIKE(value, '^[0-9-]{9,10}$')
AND    NOT REGEXP_LIKE(value, '^(\d)(-?\1){8}$');

Alternatively, you can enable Java inside the database and use look-ahead via a Java method and the regular expression:

^(?!(\d)(-?\1){8}$)(?=(\d{9}|[0-9-]{10})$)\d+-?\d+$

fiddle

MT0
  • 143,790
  • 11
  • 59
  • 117
  • The problem is that I have a table of regex with which to check the data and I can't change the logic. For every input, the system does something like: REGEXP LIKE (MYDATA, SELECT REGEX FROM TABLE) – despaolo Jan 13 '23 at 09:05
  • 1
    @despaolo In Oracle, you cannot use a single regular expression to solve this. – MT0 Jan 13 '23 at 09:07
  • 1
    @despaolo Updated with some options (you still can't do it natively with a single regular expression but you can embed Java into the database and do it via that). – MT0 Jan 13 '23 at 09:41
  • unfortunately this is not a solution I can use since the regex is stored in a db record and a simple select launches the NOT REGEXP_LIKE command. Is there a way to do something similar to what you've done here? You wrote SELECT * FROM table_name WHERE REGEXP_LIKE(value, '^\d{2}-?\d{7}$') AND NOT REGEXP_LIKE(value, '^(\d)\1-?\1{7}$'); a good fit could be something like ^\d{2}-?\d{7}$ AND NOT (^(\d)\1-?\1{7}$) – despaolo Jan 13 '23 at 15:26
  • 1
    @despaolo If you can enable Java in the database then you can do it in a single regular expression; if you cannot then Oracle's regular expression support is **NOT** sufficient to do what you want in a single statement. `^\d{2}-?\d{7}$ AND NOT (^(\d)\1-?\1{7}$)` is not a valid regular expression (in any dialect). – MT0 Jan 13 '23 at 15:44