1

I am trying to figure out a regex. That includes all characters after it but if another patterns occurs it does not overlap

This is my current regex

[a-zA-Z]{2}\d{1}\s?\w?

The pattern is always 2 letter followed by a number like AE1 or BE3 but I need all the characters following the pattern.

So AE1 A E F but if another pattern occurs in the string like AE1 A D BE1 A D C it cannot overlap with and be two separate matches.

So to clarify AB3 D T B should be one match on the regex

ABC D A F DE3 D CD A should have 2 matches with all the char following it because of the the two letter word and number. How do I achieve this

Hamed Ghasempour
  • 435
  • 3
  • 12
Ali Fares
  • 25
  • 2
  • 2
    How can the string `"ABC D A F DE3 D CD A"` have 2 matches when only the `"DE3"` part matches your description: "2 letters followed by a number"? What are the 2 parts that are supposed to match? – jwvh Jun 17 '19 at 04:33
  • Could you please check [my solution](https://stackoverflow.com/a/56627416/3832970)? I think that is exactly what you need. – Wiktor Stribiżew Jun 25 '19 at 07:29

4 Answers4

1

I'm not quite following the logic here, yet my guess would be that we might want something similar to this:

([A-Z]{2}\d\s([A-Z]+\s)+)|([A-Z]{3}\s([A-Z]+\s)+)

which allows two letters followed by a digit, or three letters, both followed by ([A-Z]+\s)+.

Demo

Community
  • 1
  • 1
Emma
  • 27,428
  • 11
  • 44
  • 69
1

Look, you have to consider where your pattern will start. I mean, you know, what is different between AE1 A E F and BE1 A D C in AE1 A D BE1 A D C? You don't want to treat both similarly. So you have to separate them. Separation of these two texts is possible only determining which one is placed in text start.

Altogether, only adding ^ to start your pattern will solve problem.

So your regex should be like this:

^[a-zA-Z]{2}\d{1}\s?\w?

Demo

Hamed Ghasempour
  • 435
  • 3
  • 12
0

You can just use this regex:

(?i)\b[a-z]{2}\d\b(?:(?:(?!\b[a-z]{2}\d\b).)+\s?)?

Demo and explanations: https://regex101.com/r/DtFU8j/1/

It uses a negative lookahead (?!\b[a-z]{2}\d\b) to add the constraint that the character matched after the initial pattern (?i)\b[a-z]{2}\d\b should not contain this exact pattern.

Allan
  • 12,117
  • 3
  • 27
  • 51
  • You do not impose any constraints on the first occurrence of `.` with the lookahead. See [this post](https://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat) to use a tempered greedy token correctly. – Wiktor Stribiżew Jun 17 '19 at 08:14
0

What you want to do is to split a string with your pattern having the current pattern match as the start of the extracted substrings.

You may use

(?!^)(?=[a-zA-Z]{2}\d)

to split the string. Details

  • (?!^) - not at the start of the string
  • (?=[a-zA-Z]{2}\d) - a location in the string that is immediately followed with 2 ASCII letters and any digit.

See the Scala demo:

val s = "ABC D A F DE3 D CD A"
val rx = """(?!^)(?=[a-zA-Z]{2}\d)"""
val results = s.split(rx).map(_.trim)
println(results.mkString(", "))
// => ABC D A F, DE3 D CD A
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563