0

I'm attepting to parse with regex this code:

IF KNVV -> KDGRP IN( "EK", "ES" )  THEN KNA1-SORTL ="KA"
ELSE KNVV -> KDGRP IN( "EL", "E3", "E5", "E7", "E2", "EF" )  THEN KNA1-SORTL ="IN"
ELSE KNVV -> KDGRP IN( "EX", "EU", "EV", "ET", "EW" )  THEN KNA1-SORTL ="CA"
END

but I can't generate a valid regex...

[ ]*[a-zA-Z0-9]+[ ]*([a-zA-Z0-9]+)[ ]*\-\>[ ]*([a-zA-Z0-9]+)[ ]*IN[ ]*\([ ]*([a-zA-Z0-9 ]+)+[ ]*\)[ ]*THEN[ ]*([a-zA-Z0-9\-]+)[ ]*\=[ ]*([a-zA-Z0-9\"\,]+)[ ]*

I need in-line validation and the following values per group:

$1: KNVV
$2: KDGRP
$3: EK,ES
$4: KNA1-SORTL
$5: KA

Is there any way to get this?

  • I'm 99% certain that `$3: EK,ES` in pure regex is going to be impossible. – MonkeyZeus Feb 04 '20 at 13:15
  • 3
    More generally, regex is not the correct tool for this in the first place. – tripleee Feb 04 '20 at 13:17
  • Does this answer your question? [Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms](https://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la) – tripleee Feb 04 '20 at 13:17
  • `^\s*\S*\s*(\S*)\s*->\s*(\S*)\s*IN\s*\(\s*([^)]*?)\s*\)\s*\S*\s*(\S*)\s*=\s*(\S*)$` ? [Try it here](https://regex101.com/r/9lzCpL/3) – Aaron Feb 04 '20 at 13:18
  • In case of $3, i can receive "item1","item2","item3" and replace all ' " ' with nothing... "item1,item2,item3". – Jose Sebastian Garcia Feb 04 '20 at 13:19
  • @tripleee What can I use in this case? do you have documentation? – Jose Sebastian Garcia Feb 04 '20 at 13:20
  • You don't allow `"` and `,` in your `[a-zA-Z0-9 ]` for $3 (should be `[a-zA-Z0-9 ",]` no need to escape them in brackets). I suggest like Aaron does to replace your `[ ]` by `\s` at least, your interpreter will be more robust – Kaddath Feb 04 '20 at 13:20
  • 1
    The traditional tool for writing parsers in C is called Yacc, though the GNU extended parser construction kit Bison has by and large taken over. If you want to work in a scripting language, I hear PyParsing is nice. – tripleee Feb 04 '20 at 13:24
  • The traditional tool for writing parsers in C is called Yacc, though the GNU extended parser construction kit Bison has by and large taken over. If you want to work in a scripting language, I hear PyParsing is nice. – tripleee Feb 04 '20 at 13:24

1 Answers1

0

While it is not possible to get the exact output EK,ES, you can get something like "EK", "ES".

Here is a regex that might do what you were looking for.

(?:IF|ELSE) ([A-Z]+) -> ([A-Z]+) IN\(( (?:"[A-Z0-9]{2}"(?:[, ])*)+)\)(?: +)THEN ([A-Z0-9\-]+) ="([A-Z]+)"

It's quite complicated so here is a visual representation of what it does. Regular expression visualization

This regex will output

Full match IF KNVV -> KDGRP IN( "EK", "ES" ) THEN KNA1-SORTL ="KA"

Group 1. KNVV
Group 2. KDGRP
Group 3. "EK", "ES"
Group 4. KNA1-SORTL
Group 5. KA

You can test more cases at this regex101

Nicolas
  • 8,077
  • 4
  • 21
  • 51