3

I have this ABAP code to find text via a regular expression:

DATA: regex            TYPE REF TO cl_abap_regex,
      match            TYPE REF TO cl_abap_matcher,
      match_result_tab TYPE match_result_tab.
TRY.
    CREATE OBJECT regex
      EXPORTING
        pattern = '01|012345'.
  CATCH cx_sy_regex .
ENDTRY.

TRY.
    CREATE OBJECT match
      EXPORTING
        regex = regex
        text  = '0123456'.
  CATCH cx_sy_matcher.
ENDTRY.

CALL METHOD match->find_all
  RECEIVING
    matches = match_result_tab.

It finds '01' (but I expect '012345').

DATA: offset TYPE i, length TYPE i.

FIND REGEX '01|012345' IN '0123456'
  MATCH OFFSET offset
  MATCH LENGTH length.

It finds 012345 as I expect.

Can someone explain why the result is different.

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
rso
  • 31
  • 1
  • 2
  • I don't know how the regex functions work in abap, but you should get the result you're looking for by using `0123456|01` instead, because the regex will then try to find `0123456` first, and if it doesn't find it will try with `01` later. Hence you get `01` only if `0123456` doesn't exist before `01` appears. – Jerry Nov 04 '13 at 11:39
  • As I wrote below: SAP-Help say: [...] Leftmost-longest rule: First, the substring furthest to the left in the character string and which matches the regular expression ("leftmost") is determined. If there are multiple substrings, the longest sequence is chosen ("longest"). This procedure is then repeated for the remaining sequence after the occurrence[...] – rso Nov 04 '13 at 13:37

2 Answers2

2

CL_ABAP_REGEX behaves differently from built-in ABAP statements with the word REGEX, that's a documented feature :

In addition to the regular expressions (in accordance with the extended POSIX standard IEEE 1003.1), the class CL_ABAP_REGEX also offers an alternative type of simplified regular expression with restricted functions. These simplified regular expressions (also known as simplified expressions) do not support all POSIX operators and use a slightly different syntax in parts. The semantics of regular expressions and simplified expressions are, however, the same.

So, the implementation is different as follows:

  • CL_ABAP_REGEX is not "greedy for alternatives" (searching 01|0121 in 0121 will match 01), but is greedy for simple texts (searching 0.*1 in 0101 will match 0101)
  • ABAP statements with the REGEX word are greedy for alternatives (searching 01|0121 in 0121 will match 0121), and for simple texts (searching 0.*1 in 0101 will match 0101)

PS:

  • I think that the class CL_ABAP_REGEX became a little bit useless; there are new useful functions like match, matches, contains, replace, and so on. Maybe its features are now all covered by the new functions.
  • I can't tell you if CL_ABAP_REGEX implements correctly the "extended POSIX standard IEEE 1003.1"; you may be interested by some information or discussions about what POSIX is, here, here and here.
Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
0

Did you try this:

  CREATE OBJECT regex
    EXPORTING
     pattern     = '012345|01'.

First hit wins.

If I execute this:

REPORT  Z_TEST.

data:
  regex type REF TO CL_ABAP_REGEX,
  match type ref to CL_ABAP_MATCHER,
  match_result_tab type match_result_tab.

CREATE OBJECT regex EXPORTING pattern = '012345|01'.
CREATE OBJECT match EXPORTING regex = regex text = '0123456'.

call METHOD match->find_all RECEIVING matches = match_result_tab.

data:
  regex2 type REF TO CL_ABAP_REGEX,
  match2 type ref to CL_ABAP_MATCHER,
  match_result_tab2 type match_result_tab.
CREATE OBJECT regex2 EXPORTING pattern = '01|012345'.
CREATE OBJECT match2 EXPORTING regex = regex2 text = '0123456'.

call METHOD match2->find_all RECEIVING matches = match_result_tab2.

I get: enter image description here

With 012345|01 you catch all the values you want, with 01|012345 you get only the 01.

knut
  • 27,320
  • 6
  • 84
  • 112
  • Yes, but I cant change this because this is user input. I have to use it dynamicaly (becaus of compatibilty reasons) so I cant use FIND REGEX. Is this a bug of SAP? SAP writes in documentation: [...] Leftmost-longest rule First, the substring furthest to the left in the character string and which matches the regular expression ("leftmost") is determined. If there are multiple substrings, the longest sequence is chosen ("longest"). This procedure is then repeated for the remaining sequence after the occurrence[...] – rso Nov 04 '13 at 12:32