1

I am trying to extract the £ value at the end of each of these lines:

Subtotal test test £20.00
Value £10.00
Subtotal test 2 £4.00
Value2 £30.00

Except I don't want to include any lines that start with "Subtotal"

So, to be clear, in this example, I just want to return:

£10.00
£30.00

I have had limited success so far, with a few SO examples including How to match a line not containing a word. Experimenting with this (https://regex101.com/r/NcXg2m/1) has got me started with:

(?m)^(?!Subtotal.*).*

Which gives me the whole lines for everything not starting with "Subtotal".

After looking through https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference, I thought the next step would simply be to add £.* to the end, but this still returns the same. Can someone please tell me where I'm going wrong? Thanks

Joe
  • 616
  • 2
  • 12
  • 27
  • 2
    What is the tool/programming language you are using? – Wiktor Stribiżew May 22 '19 at 12:47
  • I'm actually using a third party piece of software that reads PDFs. I am using an option in it to feed in a regular expression to refine the extracted string. Their userguide points to the microsoft website, but doesn't explain more than that. I would assume that the software is written in c# but I can't guaruntee that. – Joe May 22 '19 at 12:49
  • 2
    you can capture the price in a group like `^(?!Subtotal.*).*(£.+)$` – Erwan May 22 '19 at 12:50
  • 2
    If the regex library is .NET, you may use `(?<!^Subtotal.*)£[0-9.]+$` – Wiktor Stribiżew May 22 '19 at 12:52
  • Thank you both for your replies. Unfortunately the software returns nothing for either of those. In @Erwan's reply if I precede it with (?m) then it returns the whole rows again though. I'm thinking that maybe an email to the writers of the program might be useful to identify exactly whether it is .net or not -although their link does point to the microsoft .net help page. – Joe May 22 '19 at 13:02
  • Playing with this, using `(?m)((£.+)$)` gives me all of the £. So the first part does what it should, the second part does what it should too, but together return everything. Very odd. – Joe May 22 '19 at 13:18

2 Answers2

1

You may use

(?m)(?<!^Subtotal.*)£[0-9.]+(?=\s*$)

Details

  • (?m) - a multiline flag that makes ^ match start of a line and $ to match the end of a line positions
  • (?<!^Subtotal.*) - a negative lookbehind that matches a location that is not immediately preceded with Subtotal and any 0+ chars after it at the start of the string
  • £ - a £ symbol
  • [0-9.]+ - 1 or more digits or dots
  • (?=\s*$) - a positive lookahead that matches a position immediately followed with 0+ whitespaces and end of a line.

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

After playing with this further, I have somethig that works. In the end, it was a combination of both @Erwan's reply and @Wiktor Stribiżew's:

The software requires me to use the multiline (?m) instruction. And using the combination of the two in the above comments, the following works:

(?m)((?<!^Subtotal.*)(£.+)$)

Joe
  • 616
  • 2
  • 12
  • 27