2

In Oracle 11g, how can I detect a null character (chr(0)) in a regular expression?

I would expect the following condition to evaluate to true, but it does not:

select case when regexp_like (chr(0), '\0') then 1 else 0 end from dual;

My specific use case requires detecting a pattern of null characters within a varchar2 column:

 select * from my_table where NOT regexp_like (my_column, '^[\s\0]+$');
shelley
  • 7,206
  • 4
  • 36
  • 63
  • 2
    No, the *null character* (`chr(0)`) is *not* the same as an empty string, or NULL, which is entirely different. – Jeffrey Kemp May 26 '16 at 08:30
  • 1
    @JeffreyKemp: I think you mean a `NUL`, not a `NULL`? Regardless, they aren't the same - agreed. – l'L'l May 27 '16 at 02:05
  • No, what do you mean by NUL? – Jeffrey Kemp May 27 '16 at 02:12
  • 1
    @JeffreyKemp, In [the docs](http://docs.oracle.com/cd/E19078-01/mysql/mysql-refman-5.0/language-structure.html) it shows `chr(0)`, `\0`, `0x00` as `NUL`; [question/answer](http://stackoverflow.com/questions/1296843/what-is-the-difference-between-null-0-and-0) about NULL | NUL. – l'L'l May 27 '16 at 07:18
  • Good thing, too; otherwise, imagine the confusion that would have ensued when people search the Oracle docs for "NULL". – Jeffrey Kemp May 27 '16 at 22:54
  • 2
    I'm pretty sure that the confusion is already ensuing... – l'L'l May 28 '16 at 00:45
  • 2
    I strongly suggest you fix the real problem which is a design that allows embedded NULL characters in the data. Save yourself time and therefore money down the road. Bring some sense to this project now lest this wreak havoc for years to come when various folks need to use various tools to work with and print this data, none of which will handle those NULLs! If you come up with a kludge to make this work now, YOU will be the one called in the future to come up more kludges and be stuck forever with this code! Stop the insanity! :-) – Gary_W May 31 '16 at 16:13
  • @Gary_W: My application is responsible only for reading the (legacy) database, not writing to it. It should be unlikely that such characters are written to the database in practice, but it is _possible_ and so I am trying to protect my application from this havoc by filtering them out. – shelley May 31 '16 at 16:52
  • Ah well kudos to you for expecting the unexpected! Have you actually found this situation? I would be surprised as I would expect an application to read that chr(0) as the end of the field and not insert past that. Anyway good for you for being thorough. – Gary_W May 31 '16 at 17:17
  • Yes, I have found it in existing test data in this database. As mentioned, this is _likely_ (hopefully :-)) a situation that is only technically possible, but not practically possible (i.e. possible in the database itself, but application logic prevents it). – shelley May 31 '16 at 18:03

4 Answers4

1

You can use [:cntrl:] character class.

SELECT
  CASE
    WHEN regexp_like (chr(0), '[[:cntrl:]]')
    THEN 1
    ELSE 0
  END control_chr
FROM dual;

CONTROL_CHR
-----------
1
Noel
  • 10,152
  • 30
  • 45
  • 67
  • Thanks. This may be sufficient for my use case; however, the `[:cntrl:]` class also includes other characters aside from the null character, e.g. `chr(7)`. Is there any way to detect just the null character? – shelley May 26 '16 at 13:50
1

I did some research for this question. chr(0) is a thing that is causing mayhem everywhere.

I first created a table, inserted a string with chr(0) inside it and tried to find that using regexp_like.

CREATE TABLE t1(col1 VARCHAR2(10));
INSERT INTO t1 VALUES('01234' || chr(0) || '5678');
SELECT CASE WHEN REGEXP_LIKE(col1,CHR(0)) THEN 1 ELSE 0 END op,col1, dump(col1)  FROM t1;

The output was below -

op      col1       dump(col1)
----    -----      -------------------------------------------------
1       01234      Typ=1 Len=10: 48,49,50,51,52,0,53,54,55,56

As you can see the dump shows actually chr(0) is in there, though while showing the string on screen, the tool considers chr(0) as null terminator and does not show the rest. So it seemed that regexp_like is actually finding a match for chr(0) inside a string.

But to my utter dismay I found that the even a string without chr(0) will return the same result.

DELETE FROM t1;
INSERT INTO t1 VALUES('0123456789');
SELECT CASE WHEN REGEXP_LIKE(col1,CHR(0)) THEN 1 ELSE 0 END op,col1, dump(col1)  FROM t1;

op      col1           dump(col1)
-----   -------        ----------------------------------------------------
1       0123456789     Typ=1 Len=10: 48,49,50,51,52,53,54,55,56,57

So there is something fishy to handle chr(0). I posted the below question and got an explanation for the same. chr(0) being a zero length string, whenever we would try to find it inside a string, it would apparently find it everywhere, effectively finding it actually nowhere.

It seems it is not possible to find chr(0) in a string. Please see the response from MTO.

CHR(0) in REGEXP_LIKE

Community
  • 1
  • 1
SubhasisM
  • 322
  • 1
  • 4
  • 16
  • What version of Oracle database did you try this on? It works fine for me on 11gR2 - the chr(0) is considered just another character and it shows up without problem. However, chr(0) is treated specially by regexp_like which is why I would use LIKE instead, which treats it correctly. – Jeffrey Kemp May 30 '16 at 07:02
1

You can use LIKE or INSTR with CHR(0) to detect it, but REGEXP_LIKE won't work because it treats CHR(0) specially.

create table t1 (col1 varchar2(10));

insert into t1 values ('01234'||chr(0)||'5678');

insert into t1 values ('01234X5678');

select case when col1 like '%'||chr(0)||'%' then 1 else 0 end r, col1, dump(col1) from t1;

select case when instr(col1,chr(0)) > 0 then 1 else 0 end r, col1, dump(col1) from t1;

Both the above queries show the following results:

enter image description here

select case when regexp_like(col1,chr(0)) then 1 else 0 end r, col1, dump(col1) from t1;

This gives the following result:

enter image description here

You do need to be careful because some clients will stop reading strings when they encounter a CHR(0). For example, when I tried copy-and-paste from SQL Developer's query result grid, it stopped at that point.

Jeffrey Kemp
  • 59,135
  • 14
  • 106
  • 158
  • "`REGEXP_LIKE` won't work because it treats `CHR(0)` specially" - Do you have any references for this? Unfortunately, neither `LIKE` nor `INSTR` will work for my use case; I need a regular expression to find specific patterns containing null characters. – shelley May 31 '16 at 15:33
  • @shelley - it was in another answer already. Refer to MT0's answer here: http://stackoverflow.com/questions/37461369/chr0-in-regexp-like – Jeffrey Kemp Jun 01 '16 at 00:00
  • You could replace the `CHR(0)` with another character string before calling REGEXP_LIKE to get around this. – Jeffrey Kemp Jun 01 '16 at 00:01
  • That question was asked after mine, and does not contain any references to Oracle documentation either. – shelley Jun 01 '16 at 01:51
  • String manipulation wouldn't be safe, as there's no guarantee that the replaced characters wouldn't otherwise exist in the string. – shelley Jun 01 '16 at 01:53
  • Of course. You would have to pick a string that is not going to be in the original. If you're after a perfect solution for 100% of cases, I can't help you. – Jeffrey Kemp Jun 01 '16 at 02:13
1

It does not seem to be possible to identify just the null character within an Oracle regular expression. Oracle's Regular Expression reference documentation indicates what regular expressions are supported, and makes no mention of supporting a metacharacter or expression that would identify a null character (although a subset of other standard metacharacters are supported). My tests have confirmed this, as have some follow-up tests executed by others in this thread.

Although the answer to this question is that it's not possible, some of the other answers may help satisfy others' use cases. In particular, please see this answer suggesting the use of the [[:cntrl:]] class. This will identify null characters in a regular expression, although it will also match on other characters in the class.

Community
  • 1
  • 1
shelley
  • 7,206
  • 4
  • 36
  • 63