0

I am trying to parse through a string and match the nth occurrence using regex. The example I am working on currently is to pull out the third dollar value in the string. Now this could be the 2nd or 4th or nth value in the string, but the example below is specifically the third dollar value.

The string: $4,233.65 $5,073.64 $9,307.29 $9,273.41 $0.00 $0.00 $33.88
The value I am trying to match: $9,307.29
The regex I have come up with so far: (?<=\$)\S+

The code so far matches every value after a dollar sign, so the question is, how do I grab the third (or nth) value?

  • Match all, then get `matches[2]` if `matches.count > 2`. – Wiktor Stribiżew Nov 20 '19 at 20:06
  • In some languages/regex engines: `^(?:\S+\s+){2}\$\K\S+` and others `(?<=^(?:\S+\s+){2}\$)\S+`. If you indicate your language or regex engine, we can better assist you. – ctwheels Nov 20 '19 at 20:09
  • https://regex101.com/r/CvWcOB/1 – abhilb Nov 20 '19 at 20:12
  • @ctwheels i am using a program that supports JavaScript & PHP/PCRE RegEx. – Robert Schauer Nov 20 '19 at 20:12
  • @RobertSchauer you can use my first regex for PCRE - won't work in Javascript. – ctwheels Nov 20 '19 at 20:13
  • @ctwheels Yes ok looks like your first code works in this regex tester: https://regex101.com/ I now need to test it out in my program. Thank you! – Robert Schauer Nov 20 '19 at 20:16
  • @ctwheels The program I am using apparently uses a PCRE engine, however, your code of ```^(?:\S+\s+){2}\$\K\S+``` isn't working because of an unrecognized escape sequence \K. Any thoughts how to get around that? – Robert Schauer Nov 20 '19 at 20:44
  • @RobertSchauer can you use capture groups? `^(?:\S+\s+){2}\$(\S+)` – ctwheels Nov 20 '19 at 20:44
  • @ctwheels Using that code only pulls back a zero value. It is worth mentioning that there is quite a bit of text before the string (not sure if that is causing the 0 value though) – Robert Schauer Nov 20 '19 at 20:48
  • ```Printed: 7/02/2019 2:07:42PM Number State Number Pressure Base Condition REDTAIL GAS PLANT 06/2019 08/2019 HORSETAIL FEDERAL 06R-0689 0.91924 1DO0763454A CO 80 15.73 W Settlement Summary Operator Name: WHITING OIL & GAS CORPORATION Residue Liquid Gross Fees & Tax Net Operator ID: 1 Value Value Value Adjustments Tax Reimbursement Value Ctr Pty Name: WHITING OIL $4,233.65 $5,073.64 $9,307.29 $9,273.41 $0.00 $0.00 $33.88``` – Robert Schauer Nov 20 '19 at 20:50
  • @RobertSchauer change the `\S+\s+` sequence to the dollar sign one, and remove the `^` anchor `(?:\$\S+\s+){2}\$(\S+)` – ctwheels Nov 20 '19 at 20:52
  • @ctwheels I found another part of my code that was screwing up the regex portion. Using the code ```(?:\$\S+\s+){2}\$(\S+)``` gives me the result of ```$4,233.65 $5,073.64 $9,307.29``` – Robert Schauer Nov 20 '19 at 21:28
  • @RobertSchauer yes, but the first capture group's results are only the 9K value. – ctwheels Nov 20 '19 at 21:46

1 Answers1

2

From the command line using GNU grep with libpcre:

$ echo '$4,233.65 $5,073.64 $9,307.29 $9,273.41 $0.00 $0.00 $33.88' \
    |grep -Po '^(?:[^$]*\$){3}\K\S+'
9,307.29

(Explanation at Regex101) This uses a variable-width positive look-behind, which not all languages support, as simplified by \K (foo\Kbar is identical to (?<=foo)bar, matching "bar" from "foobar"). This skips two dollar amounts (it uses {3} because we're also including the lead $ since that's not part of the desired match) and then matches the next non-white-space characters.

You can use the same logic in Javascript:

let test = "$4,233.65 $5,073.64 $9,307.29 $9,273.41 $0.00 $0.00 $33.88";
test.match(/^(?:[^$]*\$){3}(\S+)/)[1];  // "9,307.29"

This is basically the same regex (explanation at Regex101), but instead of using \K before the match, I've got the desired portion in the first capture group, which match() saves in array index 1 (index 0 is the whole match, including the leading part since we're not using …\K or (?<=…) to make it zero-width).

However, if you're using a programming language like Javascript, you are better off doing it more programmatically:

let test = "$4,233.65 $5,073.64 $9,307.29 $9,273.41 $0.00 $0.00 $33.88";
test.match(/\$\S+/g)[2].substring(1);  // "9,307.29"

(Explanation at Regex101) This is more non-regex code, but much much cleaner. Here, I'm merely looking for dollar values, grabbing the third one (recall that arrays are zero-indexed), and using substring() to strip off the leading $ (strings are also zero-indexed).

Note, Javascript does not support look-behinds like …\K or (?<=…)

Adam Katz
  • 14,455
  • 5
  • 68
  • 83