python Regex to match dollar values

Question

I'm trying to come up with a regex for dollar value search in Python. I have looked and tried lots of solutions on SO posts, but none of them is quite working.

The regex I came up with is:

[Ss]        # OCR will mess up with dollar signs, so I'm specifically looking for S and s as the starting of what I'm looking for
\d+         # any digits to start off
(,\d{3})*   # include comma for thousand splits, can have multiple commas
(.\d{2})?   # include dot and 2 decimals, but only one occurrence of this part

I have tried this on the following example:

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
r = "[Ss]\d+(,\d{3})*(.\d{2})?"

re.findall(pattern=r, string=t)

And I got:

[(',330', '.00')]

Regex doc says that:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

But it is not even getting the whole number part.

My question is: I really want to find s16,330.00 as a single piece. Is there a solution?

Remove capture groups: `r = "[Ss]\d+(?:,\d{3})*(?:\.\d{2})?"` — anubhava, Nov 27 '18 at 19:30
See all explanations [here](https://stackoverflow.com/a/31915134/3832970). — Wiktor Stribiżew, Nov 27 '18 at 19:50

score 3 · Accepted Answer · answered Nov 27 '18 at 19:32

3

Remove capture groups to allow findall to return full matched string:

>>> t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
>>> r = r"[Ss]\d+(?:,\d{3})*(?:\.\d{2})?"
>>> re.findall(pattern=r, string=t)
['s16,330.00']

Also note that dot needs to be escaped in your regex

answered Nov 27 '18 at 19:32

anubhava

761,203
64
569
643

I think the key was I was missing the "?:", is that for disabling subpattern matching? – TYZ Nov 27 '18 at 19:35
`(?:..)` is called non-capture group. Read more about it: https://www.regular-expressions.info/refcapture.html – anubhava Nov 27 '18 at 19:37

Dani Mesejo · Answer 2 · 2018-11-27T19:35:01.597

1

Use finditer:

import re

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
r = "[Ss]\d+(,\d{3})*(.\d{2})?"

result = [match.group() for match in re.finditer(pattern=r, string=t)]
print(result)

Output

['s16,330.00']

The function finditer returns an iterator yielding match objects. The method group of a match object without arguments returns the whole match.

edited Nov 27 '18 at 19:35

answered Nov 27 '18 at 19:30

Dani Mesejo

61,499
6
49
76

Can you explain a little bit on why this will work? I have looked at `finditer` before and didn't know that this will work. – TYZ Nov 27 '18 at 19:32
@YilunZhang Updated the answer! – Dani Mesejo Nov 27 '18 at 19:35
Thank you for your quick answer, but I think anubhava's solution is a more straight forward one :). – TYZ Nov 27 '18 at 19:38

mrzasa · Answer 3 · 2018-11-27T19:40:46.710

Use a capturing group for the whole pattern, and non-capturing for subpatterns:

t = "sixteen thousand three hundred and thirty dollars (s16,330.00)"
re.findall(r"([Ss]\d+(?:,\d{3})*(?:.\d{2})?)", t)
['s16,330.00']

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

https://docs.python.org/2/library/re.html#re.findall

python Regex to match dollar values

3 Answers3