1

I have been trying for hours now and also read the Regex wiki here on Stackoverflow but can't seem to accomplish this regex. This is the string I have:

Lorem: 8 FB / Ipsum-/Dolor: Some Text / Dolor: Some text with (brackets) / Sit amet.: Some Text with/slash / foobar / Last one: 36 foos

What I would like to extract is: Lorem, Ipsum-/Dolor, Dolor, Sit amet., Last one. So basically everything from the beginning of the sentence or after a slash until the colon.

Whatever I try the problem is always the foobar since it always sticks together with Last one. What I tried for example so far is: ( \/ |\A)([^(?!.* \/ )].*?): which I hoped would extract everything starting from a slash going until a colon but not if there is / (empty space, slash, empty space). That way I wanted to make sure not to get foobar / Last one returned.

Could someone provide me with some hint

Chris
  • 6,093
  • 11
  • 42
  • 55
  • 1
    Your logic is not clear to me. `Ipsum-/Dolor` is not after a dash and not at the beginning of a sentence. Maybe you can use clearer words? – Daniel W. Jan 26 '17 at 14:49
  • 2
    Your explanation of the rules is not consistent or accurate. First you say "everything after a dash", then you say "everything starting from a slash." A dash is "-", a slash is either "\" or "/" (backwards or forwards slash). But even if you were consistent with that, your claim of what's extracted is also not consistent. – Don Rhummy Jan 26 '17 at 14:50
  • Right, sorry I got confused with slash and dash... I meant slash. – Chris Jan 26 '17 at 14:59

1 Answers1

6

Note that you make a common mistake placing a sequence of patterns into a character class ([...]) thus making the regex engine match a single character from the defined set. [^(?!.* \/ )] matches a single character other than (, ?, !, ., etc.

You may use a tempered greedy token:

(?: \/ |\A)((?:(?! \/ )[^:])+):
             ^^^^^^^^^^^^^^^^

See the regex demo. The literal spaces may be replaced with \s (if you can match any whitespaces) or \h (to only match horizontal whitespaces).

Details:

  • (?: \/ |\A) - either space + / + space or start of string
  • ((?:(?! \/ )[^:])+) - Group 1 capturing one or more symbols other than : ([^:]) that is not a starting point for a space + / + space sequence
  • : - a literal colon.
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563