I want to match money amount with regex for indian currency without commas

Question

I want to match amount like Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR.

Output should be 2000,2000,20000.00,20000,200.25

The regular expression i have tried is this

(?:(?:(?:rs)|(?:inr))(?:!-{0,}|\.{1}|\ {0,}|\.{1}\ {0,}))(-?[\d,]+    (?:\.\d+)?)(?:[^/^-^X^x])|(?:(-?[\d,]+(?:\.\d+)?)(?:(?:\ {0,}rs)|(?:\      {0,}rs)|(?:\ {0,}(inr))))

But it is not matching numbers with inr or rs after the amount I want to match it using re library in Python.

Try `(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?=Rs\.|INR)` — Wiktor Stribiżew, Jul 13 '16 at 06:03
Use this tool to edit your regex https://www.debuggex.com/r/DRqJhtKxJhpYr3IB — Rahul K P, Jul 13 '16 at 06:04
[Check my answer here](http://stackoverflow.com/questions/37567406/get-number-from-giver-string-using-regex/37571199?s=1|0.3736#37571199). It might assist you. — SamWhan, Jul 13 '16 at 10:34
down vote accept I dont want to match commas in amount. (?:Rs.?|INR)\s*(\d+(?:[.][^,]\d+))|(\d+(?:[.][^,]\d+))\s*(?:Rs.?|INR) but this is not working — Vinay Sawant, Jul 15 '16 at 09:25

Wiktor Stribiżew · Accepted Answer · 2016-07-13T13:04:02.310

I suggest using alternation group with capture groups inside to only match the numbers before or after your constant string values:

(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)

See the regex demo.

Pattern explanation:

(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*) - Branch 1:
- (?:Rs\.?|INR) - matches Rs, Rs., or INR...
- \s* - followed with 0+ whitespaces
- (\d+(?:[.,]\d+)*) - Group 1: one or more digits followed with 0+ sequences of a comma or a dot followed with 1+ digits
| - or
(\d+(?:[.,]\d+)*)\s*(?=Rs\.?|INR) - Branch 2:
- (\d+(?:[.,]\d+)*) - Group 2 capturing the same number as in Branch 1
- \s* - zero or more whitespaces
- (?:Rs\.?|INR) - followed with Rs, Rs. or INR.

Sample code:

import re
p = re.compile(r'(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR)')
s = "Rs. 2000 , Rs.3000 , Rs 40,000.00 ,50,000 INR 600.25 INR"
print([x if x else y for x,y in p.findall(s)])

See the IDEONE demo

Alternatively, if you can use PyPi regex module, you may leverage branch reset construct (?|...|...) where capture group IDs are reset within each branch:

>>> import regex as re
>>> rx = re.compile(r'(?|(?:Rs\.?|INR)\s*(\d+(?:[.,]\d+)*)|(\d+(?:[.,]\d+)*)\s*(?:Rs\.?|INR))')
>>> prices = [match.group(1) for match in rx.finditer(teststring)]
>>> print(prices)
['2000', '2000', '20,000.00', '20,000', '200.25']

You can access the capture group in each branch by ID=1 (see match.group(1)).

down vote accept I dont want to match commas in amount. (?:Rs.?|INR)\s*(\d+(?:[.][^,]\d+))|(\d+(?:[.][^,]\d+))\s*(?:Rs.?|INR) but this is not working — Vinay Sawant, Jul 15 '16 at 09:25
If you do not want to match whole values with commas, use `(?:Rs\.?|INR)\s*(\d+(?:[.]\d+)*)\b(?!\,)|(?<!\d\,)\b(\d+(?:[.]\d+)*)\s*(?:Rs\.?|INR)` with the `re` module. See [this demo](https://regex101.com/r/rK7tL8/3). And [this IDEONE demo](https://ideone.com/dxao9h). — Wiktor Stribiżew, Jul 15 '16 at 09:33
@ajinzrathod There are two capturing groups in the `re` pattern. — Wiktor Stribiżew, Feb 25 '22 at 10:06

Jan · Answer 2 · 2016-07-13T13:43:13.127

2

Though slightly out of scope, here's a fingerplay with the newer and far superior regex module by Matthew Barnett (which has the ability of subroutines and branch resets):

import regex as re

rx = re.compile(r"""
(?(DEFINE)
    (?<amount>\d[\d.,]+)    # amount, starting with a digit
    (?<currency1>Rs\.?\ ?)  # Rs, Rs. or Rs with space
    (?<currency2>INR)       # just INR
)

(?|
    (?&currency1)
    (?P<money>(?&amount))
|
    (?P<money>(?&amount))
    (?=\ (?&currency2))
)

""", re.VERBOSE)

teststring = "Rs. 2000 , Rs.2000 , Rs 20,000.00 ,20,000 INR 200.25 INR."
prices = [m.group('money') for m in rx.finditer(teststring)]
print prices

# ['2000', '2000', '20,000.00', '20,000', '200.25']

This uses subroutines and a branch reset (thanks to @Wiktor!).
See a demo on regex101.com.

edited Jul 13 '16 at 13:43

answered Jul 13 '16 at 09:55

Jan

42,290
8
54
79

1

Nice one. Here is another [PCRE regex with conditionals](https://regex101.com/r/kI6iU7/2) I played with. – bobble bubble Jul 13 '16 at 10:47
@bobblebubble: Good one as well! – Jan Jul 13 '16 at 11:30
The only problem is that you get this output: `['Rs. 2000', 'Rs.2000', 'Rs 20,000.00', '20,000 INR', '200.25 INR']` while OP needs only numbers. With the power of `regex` module, I'd rather use *branch reset*. See my updated answer. – Wiktor Stribiżew Jul 13 '16 at 13:04
Actually, in this case, `regex` module is redundant, you can do that with `re` as well. If you used capture groups with the same name (`re` does not allow that), it would make sense. – Wiktor Stribiżew Jul 13 '16 at 13:20
It does not, but it is not necessary here. – Wiktor Stribiżew Jul 13 '16 at 13:23
0 down vote accept I dont want to match commas in amount. (?:Rs.?|INR)\s*(\d+(?:[.][^,]\d+))|(\d+(?:[.][^,]\d+))\s*(?:Rs.?|INR) but this is not working – Vinay Sawant Jul 15 '16 at 07:41
@VinaySawant: Please edit your original question and put your requirements there. – Jan Jul 15 '16 at 07:45

score 0 · Answer 3 · answered Jul 13 '16 at 07:05

0

And another:

(([\d+\,]+)(\.\d+)?\s\w{3}|(\w+\.?)\s?[\d+\,]+(\.?\d+))

answered Jul 13 '16 at 07:05

Michał M

618
5
13

Note that `+` inside a character class is treated as a literal `+`. Also, a comma is not a special character, no need to escape it here. – Wiktor Stribiżew Jul 13 '16 at 07:43
0 down vote accept I dont want to match commas in amount. (?:Rs.?|INR)\s*(\d+(?:[.][^,]\d+))|(\d+(?:[.][^,]\d+))\s*(?:Rs.?|INR) but this is not working – Vinay Sawant Jul 15 '16 at 07:41

I want to match money amount with regex for indian currency without commas

3 Answers3