1

I'd like to match a string with the following conditions

  • must start with an A
  • followed by any count of anything unless it's two consecutive upper case letter
  • followed by a number (which should be captured)

A bcd 1 should match and capture 1

Abcd1 should match and capture 1

A bcd should not match because there is no number

A BCd 1 should not match because there is a capital C between the A and the number

A bcd 1 EF should match because 1 is before the EF

I came up with

A(?!.*[A-Z]{2})+?.*(\d+)

but that does not work for the last use case because the negative lookahead goes beyond the 1

Here is a playground https://regex101.com/r/1zRCrp/3

Jan
  • 7,444
  • 9
  • 50
  • 74

3 Answers3

1

Note that (?!.*[A-Z]{2})+? is the same as (?!.*[A-Z]{2}) as the lookahead only needs to be executed once (+? matches one or more but as few as possible occurrences, quantifying lookbehinds is always a wrong idea). The .* matches any chars other than line break chars as many as possible, so it will grab all text up to the last digit and (\d+) thus captures the last digit on a matching line.

You can use

A(?:(?![A-Z]{2}).)*?(\d+)

See the regex demo.

Details:

  • A - an A letter
  • (?:(?![A-Z]{2}).)*? - zero or more (but as few as possible) occurrences of a char other than line break chars that does not start a two-uppercase letter char sequence
  • (\d+) - Group 1: one or more digits.

If you need to match across multiple lines see solutions in How do I match any character across multiple lines in a regular expression?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You could exclude matching digits in the lookahead and in the match using \D

A(?!\D*[A-Z]{2})\D*(\d+)

See a regex101 demo.


To not cross newlines you can use [^\d\n]

If you also want to prevent A from being part of a partial word match, you can append a word boundary \bA

\bA(?![^\d\n]*[A-Z]{2})[^\d\n]*(\d+)

See another regex101 demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

Instead of capturing the digits you can also use \K to reset the match upon encountering a digit so that the digits can be the entirety of the match outright:

^A(?:(?![A-Z]{2}).).*?\K\d+

Demo: https://regex101.com/r/JvrSIR/1

blhsing
  • 91,368
  • 6
  • 71
  • 106