1

I'm having an issue and I'm hoping there is someone who is more knowledgeable with Regex that can help me out.

I'm trying to extract data from a PDF file which contains a budget line items. I'm using this regex pattern to get the index of the first number so I can then extract the numbers to the right.

Regex pattern:

(([(]?[0-9]+[)]? )|([(]?[0-9]+[)]?)|(- )|(-))+$

Line item: 'Modernization and improvement (note 9) 260 (180) 640 - 155'

This works well for 99% of the line items except this one I came across. The problem is the pattern matches the '9)' in what is the text portion.

Is there any way with this Regex pattern to say if there are brackets, the inside must contain numbers only?

Thanks!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
sleepymatto
  • 97
  • 1
  • 8
  • Maybe e.g. [`(?:(?:\(\d+\)|\d+)[ -]*)+$`](https://regex101.com/r/UvGiKB/1) would even suffice here. Depends on how your input looks like. It matches either numbers inside parentheses *OR* without plus optional hyphens or spaces after each. – bobble bubble Oct 07 '22 at 09:15

1 Answers1

2

You can repeat all possible options until the end of the string:

(?:\(\d+\)|\d+(?:\s*-\s*\d+)?)(?:\s+(?:\(\d+\)|\d+(?:\s*-\s*\d+)?))*$

Explanation

  • (?: Non capture group
    • \(\d+\) Match 1+ digits between parenthesis
    • | Or
    • \d+(?:\s*-\s*\d+)? Match 1+ digits and optionally match - and 1+ digits
  • ) Close the non capture group
  • (?: Non capture group to repeat as a whole part
    • \s+ Match 1+ whitespace chars
    • (?:\(\d+\)|\d+(?:\s*-\s*\d+)?) Same as the first pattern
  • )* Close the non capture group and optionally repeat it
  • $ End of string

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70