2

I try to extract the non-white space character from a string using REGEXP_SUBSTR.

I use this regex \s*([^\s]*)\s* and extract the first group

Using \s to denote white spaces as documented here removes the leading spaces, but preserves the training once.

 select  '"'||REGEXP_SUBSTR( ' xxx  ', '\s*([^\s]*)\s*', 1, 1, NULL, 1 )||'"' AS text
 from dual;

 "xxx  "

Only if I switch to the POSIX syntax as documented for 10g I get the expected result.

 select  '"'||REGEXP_SUBSTR( ' xxx  ', '\s*([^[:space:]]*)[:space:]*', 1, 1, NULL, 1 )||'"' AS text
 from dual;

 "xxx"

What's (I'm doing) wrong?

I'm on 12.1.0.2.0

NLS_CHARACTERSET WE8MSWIN1252

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Have you checked this [documentation](http://docs.oracle.com/database/121/SQLRF/ap_posix002.htm#SQLRF55542) ? Is says that `\` is not supported for multilingual enhancements. –  Jan 30 '16 at 21:15
  • `select '"'||REGEXP_SUBSTR( ' xxx ', '\s*(\S*)\s*', 1, 1, NULL, 1 )||'"' AS text from dual;` – MT0 Jan 30 '16 at 22:31
  • @MTO thanks for the nice alternative, but the question was about the usage of \s in Nonmatching Character List. I was confused with [this](http://stackoverflow.com/questions/2788112/what-is-the-regular-expression-for-not-whitespace-and-not-a-hyphen) accepted answer. It is simple wrong or some extension? – Marmite Bomber Jan 31 '16 at 08:32

1 Answers1

1

[^\s] is incorrect, you can't use backslashed groups inside [].

The documentation clearly states that

In the list, all operators except these are treated as literals

Range operator: -

POSIX character class: [: :]

POSIX collation element: [. .]

POSIX character equivalence class: [= =]

So \ mean it self inside []

Community
  • 1
  • 1
Zbynek Vyskovsky - kvr000
  • 18,186
  • 3
  • 35
  • 43
  • 1
    You can use `\s` inside `[]`. –  Jan 30 '16 at 21:00
  • @MarmiteBomber That is fine. The issue is that `[^\s]` actually matches everything except "\" and "s", i.e. including the remaining white space. – Zbynek Vyskovsky - kvr000 Jan 30 '16 at 21:00
  • @Zbynek thank for the point, I'm trying to find a clarification in Oracle documentation, but no luck. – Marmite Bomber Jan 30 '16 at 21:15
  • @MarmiteBomber: You should check this [documentation](http://docs.oracle.com/database/121/SQLRF/ap_posix002.htm#SQLRF55542). –  Jan 30 '16 at 21:16
  • @Zbynek you are right +1. This `select REGEXP_SUBSTR( 'xxxssss', '([^\s]*).*', 1, 1, NULL, 1 ) AS text from dual` returns the xxx only. Anyway I thing this is specific to Oracle implementation, if so you may restrict your answer. I'll accept after a documentation link that clarify the behavior is provided. – Marmite Bomber Jan 30 '16 at 21:33
  • @MarmiteBomber This is general behavior of regexp groups everywhere I have seen. The only special characters inside `[]` are just `[`, `]`, `-`. Not sure about the link though... – Zbynek Vyskovsky - kvr000 Jan 30 '16 at 21:42
  • @Zbynek [this](http://stackoverflow.com/questions/2788112/what-is-the-regular-expression-for-not-whitespace-and-not-a-hyphen) confused me. Probably some extension... I found in between the documentation and I'll add it to your answer and accept. Thanks for support! – Marmite Bomber Jan 30 '16 at 21:52