1

I’m new to regex, and I've spent a fair amount of time experimenting on regex testers, searching the web, etc. on the following issue. I’m using Python 3.7+.

Example Text String:

((AC00001234 + AC00005678) / 365) * (5 + 10)

Note - AC is always in uppercase and followed by exactly 8 digits.

Desired Outcome: A matched group with the following items. More specifically, any and all numbers not with the AC-prefix.

  • 365
  • 5
  • 10

While I’ve tried more things than I can count, I’m come closest with a negative lookbehind (below). The problem is that the result is pulling in 00001234 and 00005678 as well. I’ve tried explicit character classes [0-9], adjusting some groupings, etc.

Current Code:

(?<!AC\d{8})\d+

Current Outcome:

  • 00001234
  • 00005678
  • 365
  • 5
  • 10

On Stack Overflow, I looked at the following: Negative lookbehind in a regex with an optional prefix, Match pattern not preceded or followed by string, Standalone numbers Regex?, and Regex to identify standalone numbers.

For simplicity, I've broken down the parsing into three other steps (e.g., extracting the AC-prefix codes only, math operators, etc.), and this piece is the final one I need to solve.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197

1 Answers1

0

The obvious way to do it is this: (?<!AC)\d+ - a bunch of digits that is not preceded by AC. However, that fails, because it matches 0001234, as it is preceded by 0, and not AC. The missing piece is that you have to assert also that it is not preceded by a digit:

(?<!AC)(?<!\d)\d+

Depending on the possible input strings, a word boundary assertion can also do a similar job:

(?<!AC)\b\d+

Your code ((?<!AC\d{8})\d+) fails because it means "a bunch of digits not preceded by ACXXXXXXXX (where X is a digit). AC00001234 is not preceded by AC and eight more digits, so it is a match. You could kind of fix it by asserting it after the match: \d+(?<!AC\d{8}), but that fails for a similar reason - it will disqualify 00001234, but it does not disqualify 0000123, because there is no AC and eight digits in front of its end - only seven! so you still need a boundary assertion:

\d+(?<!AC\d{8})\b

However, this is less clear than the first two solutions (and also requires you to know the length of the ACXXXXXXXX string).

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Off-topic note: thank you for a well-asked question in [tag:regex]. You posted test cases and you posted things you attempted, and you isolated the relevant issue - the vast majority of [tag:regex] questions just want someone else to write regex for them. Welcome to Stack Overflow, you're off to a good start. – Amadan Sep 16 '20 at 22:59
  • First of all, thank you for offering multiple ways to solve this issue. It really helps someone who is trying to learn. Building on that point, your explanation about why my solution wasn’t working and how matches (my code and your solutions) were happening further clarified things. Finally, I appreciate the encouragement you offered regarding the structure of my post. Learning can be a real struggle at times, especially deciding when to seek help or to spend more time on one’s own. I sincerely appreciate this community’s willingness to share knowledge and help each other. – AIphanumeric425 Sep 17 '20 at 15:23