0

I have a regex which finds all kind of money denoted in dollars,like $290,USD240,$234.45,234.5$,234.6usd

(\$)[0-9]+\.?([0-9]*)|usd+[0-9]+\.?([0-9]*)|[0-9]+\.?[0-9]*usd|[0-9]+\.?[0-9]*(\$)

This seems to works, but how can i avoid the complexity in my regex?

Liam
  • 27,717
  • 28
  • 128
  • 190
sree vidya
  • 33
  • 8
  • https://stackoverflow.com/questions/13106510/techniques-for-simplifying-a-regular-expression-by-hand https://stackoverflow.com/questions/23566410/simplifying-regular-expression – d4nyll Jul 26 '17 at 09:31
  • There is this -> http://ivanzuzak.info/noam/webapps/regex_simplifier/ – d4nyll Jul 26 '17 at 09:31
  • It does not finds "$.0", for example. – Gangnus Jul 28 '17 at 10:59

3 Answers3

2

It is possible to make the regex a bit shorter by collapsing the currency indicators:
You can say USD OR $ amount instead of USD amount OR $ amount. This results in the following regex:

((\$|usd)[0-9]+\.?([0-9]*))|([0-9]+\.?[0-9]*(\$|usd))

Im not sure if you'll find this less complex, but at least it's easier to read because it's shorter

The character set [0-9] can also be replaced by \d -- the character class which matches any digit -- making the regex even shorter.
Doing this, the regex will look as follows:

((\$|usd)\d+\.?\d*)|(\d+\.?\d*(\$|usd))

Update:

  • According to @Toto this regex would be more performant using non-capturing groups (also removed the not-necessary capture group as pointed out by @Simon MᶜKenzie):

    (?:\$|usd)\d+\.?\d*|\d+\.?\d*(?:\$|usd)
    
  • $.0 like amounts are not matched by the regex as @Gangnus pointed out. I updated the regex to fix this:

    ((\$|usd)((\d+\.?\d*)|(\.\d+)))|(((\d+\.?\d*)|(\.\d+))(\$|usd))
    

    Note that I changed \d+\.?\d* into ((\d+\.?\d*)|(\.\d+)): It now either matches one or more digits, optionally followed by a dot, followed by zero or more digits; OR a dot followed by one or more digits.

    Without unnecessary capturing groups and using non-capturing groups:

    (?:\$|usd)(?:\d+\.?\d*|\.\d+)|(?:\d+\.?\d*|\.\d+)(?:\$|usd)
    
ScintillatingSpider
  • 338
  • 1
  • 9
  • 14
  • Removing the brackets around your main two alternatives will make it even simpler... – Simon MᶜKenzie Jul 26 '17 at 11:23
  • @Simon MᶜKenzie I personally prefer to keep those brackets since they make it easier to see where the alternatives of the alternation begin and end – ScintillatingSpider Jul 26 '17 at 11:30
  • Fair enough - different tastes! – Simon MᶜKenzie Jul 26 '17 at 12:06
  • 1
    Just for info: Easier to read but less efficient! Also non capture groups are better than capture ones if you don't care to catch them. – Toto Jul 26 '17 at 12:23
  • @Gangnus I do not agree that my answer is incorrect. Granted, it does not match `$.0`, but the question is how to *reduce the complexity* of a regex that works for them, and the question does not have `$.0` as one of it's test cases. – ScintillatingSpider Jul 28 '17 at 11:08
  • @ScintillatingSpider I know. I haven't downvoted your answer. But if a question looks how to improve a piece of code, having errors, we should first remove the error, shouldn't we? BTW, my answer also is not full - it matches -0 and 0001 – Gangnus Jul 28 '17 at 11:55
  • @Gangnus Fair enough. Still, the OP might know `$.0` will not occur during the intended use. (Or they might not, so I did -- and still do -- totally agree with you raising this concern. I also did not think of this case.) I only feel that "Incorrect" is not the right word until we know if `$.0` has to be included but was overlooked by the OP :) – ScintillatingSpider Jul 28 '17 at 12:28
1

Try this

^(?:\$|usd)?(?:\d+\.?\d*)(?:\$|usd)?$

mkHun
  • 5,891
  • 8
  • 38
  • 85
  • That can be fixed by pulling the `\.` out of `[\d\.]` (resulting in `[\d]+\.?`). More importantly, **this regex also matches digits *without* currency indicators**. Fixing this will need lookarounds, making the regex more difficult – ScintillatingSpider Jul 26 '17 at 11:22
  • It also matches digits with two currencies: `$12.USD` – Toto Jul 26 '17 at 12:21
0

Reducing the complexity you are reducing the correctness. The following regex works correctly, but even it doesn't take lowcase. (but that could be managed by a key). All other current answers here simply haven't the correct substring for the decimal number.

^\s*(?:(?:(?:-?(?:usd|\$)|(?:usd|\$)-)(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d)))|(?:-?(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d))(?:usd|\$)))\s*$

Look here at the test results.

Make a correct line and only after that try to shorten it.

Gangnus
  • 24,044
  • 16
  • 90
  • 149