How to alter my expression to return the same result while being compliant with POSIX BRE / ERE?

Question

I'm trying to use Snowflakes regex implementation, which I have just discovered is POSIX BRE/ERE. I had previously fashioned a regex expression to allow me to identify all commas not in double quoted string sections with a custom delimiter (for text file parsing).

Sample text string:

"Foreign Corporate Name Registration","99999","Valuation Research",,"Active Name",02/09/2020,"02/09/2020","NEVADA","UNITED STATES",,,"123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES","123 SOME STREET",,"MILWAUKEE","WI","53202","UNITED STATES",,,,,,,,,,,,

Regex command and substitution (working in regex101.com):

([("].*?["])*?(,)

\1#^#

Regex101.com (and desired) result:

"Foreign Corporate Name Registration"#^#"99999"#^#"Valuation Research"#^##^#"Active Name"#^#02/09/2020#^#"02/09/2020"#^#"NEVADA"#^#"UNITED STATES"#^##^##^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^#"123 SOME STREET"#^##^#"MILWAUKEE"#^#"WI"#^#"53202"#^#"UNITED STATES"#^##^##^##^##^##^##^##^##^##^##^##^#

So, given that I am now belatedly discovering that I cannot use lazy quantifiers, can any uber-regex'ers advise on how I might alter my expression to return the same result while being compliant with POSIX BRE/ERE?

@WiktorStribiżew - I did not! With a small modification ```("[^"]*")*(,)```, that works perfectly! Sir, thank you very much!! And I can't work out how to give you credit for it, I assume because its a comment - sorry :( — CaseyR, Sep 03 '20 at 08:28
But why are you capturing the comma? You are not using the second group, you have `\1#^#` in the replacement. — Wiktor Stribiżew, Sep 03 '20 at 08:29
The comma is actually the character being replaced, my (weak) understanding is that the first group is negating text within the quotes. With your regex, I get: ```"Foreign Corporate Name Registration"#^##^#,"99999"#^##^#,"Valua...``` with the addition of the second group I get the desired: ```"Foreign Corporate Name Registration"#^#"99999"#^#"Valua...``` — CaseyR, Sep 03 '20 at 08:39
No, the group saves the captured text in a separate memory buffer and backreferences like `\1`, `\2`, etc. are sheer placeholders for those matches. — Wiktor Stribiżew, Sep 03 '20 at 08:40

score 0 · Accepted Answer · answered Sep 03 '20 at 08:34

0

You need to

Convert the lazy quantifiers into greedy here as they will still match in the same way as with lazy quantifiers
[("] matches ( or ", you need to only match " with this character class, use " only.

The final POSIX ERE expression will look like

("[^"]*")*(,)

It matches

("[^"]*")* - zero or more occurrences of ", one or more chars other than " and then a " (Group 1)
(,) - a comma (Group 2)

NOTE: POSIX BRE expression will look like \("[^"]*"\)*\(,\) where capturing groups are defined with a pair of escaped parentheses.

answered Sep 03 '20 at 08:34

Wiktor Stribiżew

607,720
39
448
563

1

Great explanation, time for me to head to RegEx101 - thank you Wiktor! – CaseyR Sep 03 '20 at 08:44
@CaseyR You should watch out for incompatibility between all regex flavors supported at regex101.com and POSIX BRE/ERE. Also, see [this thread](https://stackoverflow.com/questions/18514135/bash-regular-expression-cant-seem-to-match-any-of-s-s-d-d-w-w-etc). – Wiktor Stribiżew Sep 03 '20 at 08:51
I'll happily upvote your answer, but your phrasing is weird. Surely if the *question* is good it could still have answers which deserve downvotes? Though of course that's not the case here really. – tripleee Sep 09 '20 at 07:48

How to alter my expression to return the same result while being compliant with POSIX BRE / ERE?

1 Answers1