0

I have the following expression:

(?!\d+\s+TOTAL\s+)\$+\d+\.?\d+\s+

It produces the result "$23.00$0.03$23.80" from the following text:

SPEEDWAY 3007906          
Wallace NC 28466          
TRAM: 1086244             
9/17/2017  2:12 pm        
Pump 08                   
Regular Unleaded          
8,716 @ $2,639/6131       
GAS TOTAL           $23.00
TAX                 $0.03 
TOTAL               $23.80
Uisa                     

What regular expression will pull just $23.80 in this case? If I add positive lookahead, so that the expression is "(?!\d+\s+TOTAL\s+)\$+\d+\.?\d+\s+(?=.*\$\d+\.?\d+)", the result is "$23.00$0.03" and not "$23.80". Please help. Thanks in advance.

azro
  • 53,056
  • 7
  • 34
  • 70
Davie Overman
  • 114
  • 1
  • 13
  • Why not just put the last value in question as the capture group and problem solved? – l'L'l Feb 10 '18 at 20:18
  • Let dot consume: [`".*(\\$[\\d.]+)"`](https://regex101.com/r/6CVDrw/1) with `DOTALL` flag, grab first capture. – bobble bubble Feb 10 '18 at 20:20
  • I think you confused negative or positive look-ahead. Negative look-ahead seems like it would make more sense here. Your regex gives `$23.80` if you change `?=` to `?!`. – Bernhard Barker Feb 10 '18 at 20:21
  • Note that `(?!\d+\s+TOTAL\s+)` doesn't do anything whatsoever in your regex. Was that meant to be look-**behind** (`?<!`) and not look-**ahead** (`?!`) ? But even then it's not clear what you're trying to exclude with that. – Bernhard Barker Feb 10 '18 at 20:33
  • Possible duplicate of [Find the last match with Java regex matcher](https://stackoverflow.com/questions/6417435/find-the-last-match-with-java-regex-matcher) – Bernhard Barker Feb 10 '18 at 20:41
  • While you asked for the last price and [this](https://stackoverflow.com/questions/6417435/find-the-last-match-with-java-regex-matcher) gives a way to get the last match, I think the best solution for your problem will depend on your data and it might not be to just get the last price. – Bernhard Barker Feb 10 '18 at 20:43
  • Why do you have to use regex for the entirety of that string? Are you reading this in from a file? Can you just analyze one line? – Makoto Feb 10 '18 at 20:52
  • If a single var contains this whole text, you may use `String res = s.replaceFirst("(?s).*([$]\\d[\\d.]*).*", "$1");` – Wiktor Stribiżew Feb 10 '18 at 21:52
  • You are using a hammer to crack a nut. Regular expressions aren't a solution to everything. – user207421 Feb 11 '18 at 09:32
  • l'L'l , I'm not sure how to put the last value as the capture group. – Davie Overman Feb 11 '18 at 21:15
  • Dukeling, was something like this expression, (?!TOTAL\s+)\$+\d+\.?\d+\s+(?!.*\$\d+\.?\d+)(?=[A-Za-z]+\s+) , what you meant because it also seems to give the desired result? – Davie Overman Feb 11 '18 at 21:17
  • Makoto, I am reading this from a file. – Davie Overman Feb 11 '18 at 21:18
  • I'll try that, bobble bubble – Davie Overman Feb 11 '18 at 21:19

2 Answers2

1

Try this:

(?<=^TOTAL)\s*(\$\s*\d+\.?\d*)\s*$

Make sure you use MULTILINE match.

This will match all the spaces around the value, so you may want to strip those out to get the value

Example:

String in = "SPEEDWAY 3007906\n" +          
"Wallace NC 28466          \n" +
"TRAM: 1086244             \n" +
"9/17/2017  2:12 pm        \n" +
"Pump 08                   \n" +
"Regular Unleaded          \n" +
"8,716 @ $2,639/6131       \n" +
"GAS TOTAL           $23.00\n" +
"TAX                 $0.03 \n" +
"TOTAL               $23.80\n" +
"Uisa          ";

Pattern p = Pattern.compile("(?<=^TOTAL)\\s*(\\$\\s*\\d+\\.?\\d*)\\s*$", MULTILINE);
Matcher m = p.matcher(in);

if(m.find()) {
    System.out.println(m.group(1));
}

This should print just the matched value

smac89
  • 39,374
  • 15
  • 132
  • 179
  • May I ask why this only works if MULTILINE is enabled? Just to check my understanding, means Look for the letters 'T', 'O', 'T', 'A', 'L' at the beginning of a line, but do not include them in the match. Then look for any amount of spaces, followed by one or more dollar signs and digits with an optional dot and one or more digits. Then any amount of spaces until it reaches the end of the line. I noticed that you have parentheses around "\\$+\\d+\\.?\\d+" to specify a capturing group. Is this optional or necessary for the expression to work? Why? – Davie Overman Feb 11 '18 at 21:12
  • @DavieOverman I just updated it to only search for one dollar sign. In any case, it only works when multiline is enabled because the thing we are searching for is not on the first line. If it were, or if the string does not have those newlines, then we wouldn't need multiline, but will need to slightly modify the regex – smac89 Feb 11 '18 at 21:18
  • @DavieOverman as for the capture groups, they are not really necessary. However, they are good for getting just what we want to match and ignoring the extra spaces around it – smac89 Feb 11 '18 at 21:19
1

Maybe you could use a negative lookbehind to assert that what is before TOTAL is not GAS and capture your value in group 1.

(?<!GAS )TOTAL\s*(\$\d+\.\d+)

Demo output Java

The fourth bird
  • 154,723
  • 16
  • 55
  • 70