0

Text:

[A]I'm an example text [] But I want to be included [[]]
[A]I'm another text without a second part []

Regex:

\[A\][\s\S]*?(?:(?=\[\])|(?=\[\[\]\]))

Using the above regex, it's not possible to capture the second part of the first text.

Demo

Is there a way to tell the regex to be greedy on the 'or'-part? I want to capture the biggest group possible.

Edit 1:

Original Attempt:

Demo

Edit 2:

What I want to achive:

In our company, we're using a webservice to report our workingtime. I want to develop a desktop application to easily keep an eye on the worked time. I successfully downloaded the server's response (with all the data necessary) but unfortunately this date is in a quiet bad state to process it.

Therefor I need to split the whole page into different days. Unfortunately, a single day may have multiple time sets, e.g. 06:05 - 10:33; 10:55 - 13:13. The above posted regular expression splits the days dataset after the first time set (so after 10:33). Therefor I want the regex to handle the Or-part "greedy" (if expression 1 (the larger one) is true, skip the second expression. If expression 1 is false, use the second one).

Th1sD0t
  • 1,089
  • 3
  • 11
  • 37
  • 1
    Order your or statement from biggest to smallest. – zzxyz Oct 08 '18 at 17:58
  • @zzxyz I already tried that but re-ordering the regex to "\[A\][\s\S]*?(?:(?=\[\[\]\])|(?=\[\]))" has the same result – Th1sD0t Oct 08 '18 at 18:05
  • Ah, yeah, the issue isn't your `or` expression (https://stackoverflow.com/questions/35606426/order-of-regular-expression-operator) The issue is your non-greedy lead into it. Well...you do also need to go left->right from most to least preferred match in your `or`. – zzxyz Oct 08 '18 at 18:36

2 Answers2

1

You may use

\[A][\s\S]*?(?=\[A]|$)

See the regex demo.

Details

  • \[A] - a [A] substring
  • [\s\S]*? - any 0+ chars as few as possible
  • (?=\[A]|$) - a location that is immediately followed with [A] or end of string.

In C#, you actually may even use a split operation:

Regex.Split(s, @"(?!^)(?=\[A])")

See this .NET regex demo. The (?!^)(?=\[A]) regex matches a location in a string that is not at the start and that is immediately followed with [A].

If instead of A there can be any letter, replaces A with [A-Z] or [A-Z]+.

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Sure, this works for this example - unfortunately I've got a use-case where this approach does not work because the string I'd use in the lookahead exists in the text itself. The only way to separate the "whole" text is to use the last few characters ([ and ]) (those only differ in the length). – Th1sD0t Oct 08 '18 at 18:05
  • @C4p741nZ Show the real string then. You should be able to define where the pattern should start and stop matching. Otherwise, you cannot use a regex. – Wiktor Stribiżew Oct 08 '18 at 18:06
  • The original text is about 35.000 characters in length - as SO only allows 30.000 it's not that easy. I'll try to cut some not necessary lines tho. – Th1sD0t Oct 08 '18 at 18:11
  • Use pastebin.com, but it would be better if you could share just the smallest text necessary to repro the issue. And please add the match boundaries definition to the question, else, I'd rather vote to close. – Wiktor Stribiżew Oct 08 '18 at 18:11
  • I've added a Demo with the original data on regex101 and added an explanation on what I want to do (if that's what you meant with "match boundaries definition). – Th1sD0t Oct 08 '18 at 18:25
1

I have changed your regex (actually simpler) to do what you want:

\[A\].*\[?\[\]\]?

It starts by matching the '[A]', then matches any number of any characters (greedy) and finally one or two '[]'.

Edit:

This will prefer double Square brackets:

\[A\].*(?:\[\[\]\]|\[\])
Poul Bak
  • 10,450
  • 5
  • 32
  • 57
  • Okay, ignore my last comment, please - I just wrote before checking the result. I'll not try to adapt this to my original need :) – Th1sD0t Oct 08 '18 at 18:27
  • Okay, seems like I accidentally put a non-capturing group around my OR - because of this, the priorisation did not work. Thank you! – Th1sD0t Oct 08 '18 at 18:37