-4

Can you suggest a regular expression which supports American and European Number format

Eg: US - 999,999.99 UK - 999.999,99

BHARAT ATHOTA
  • 191
  • 2
  • 3
  • 10

3 Answers3

0

A simple solution would be to match one OR the other:

(?:\d{1,3})(?:\.\d{3})?(?:\.\d{3})?(?:,\d+)?|(?:\d{1,3})(?:,\d{3})?(?:,\d{3})?(?:\.\d+)?

Matches 1-3 digits, optionally followed by full stop and three digits, optionally followed by full stop and three digits and optionally followed by a comma and any number of digits (based on your example you may want to limit to two). Then, if this doesn't match, try it with full stops and comma reversed.

Edit: Considering your comment in kbysiec's answer, you may want to check the start and the end -

(?:\s|^)(?:(?:\d{1,3})(?:\.\d{3})?(?:\.\d{3})?(?:,\d+)?|(?:\d{1,3})(?:,\d{3})?(?:,\d{3})?(?:\.\d+)?)(?:\s|$)

This ensures the number is preceded by a space OR the beginning of the line, and terminated by a space or the end of the line.

Regards

Edit 2:

(?:\s|^)(?:(?:\d{1,3})(?:\.\d{3})*(?:,\d+)?|(?:\d{1,3})(?:,\d{3})*(?:\.\d+)?)(?:\s|$)

Shorter and allows numbers to be of any length.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
0

(?<=\s|^)(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d{1,3}(?:.\d{3})*(?:\,\d+)?)(?=\s|$) should match everything correctly, also checking incorrect placements of dots/commas, as seen here

Throw in |\d*([,.]\d+)? before the last lookahead to also match non-seperated numbers.

The idea is that you will have ,xxx.yy, in which case there's a digit with a length of 1-3. This situation can repeat itself.

Still, this is no easy task for regex, and this regex is not very readable. Some other tools would be probably better for this.

Andris Leduskrasts
  • 1,210
  • 7
  • 16
0

First off, whenever someone asks for a regular expression my first response is to ask if that's the right tool for the job: https://softwareengineering.stackexchange.com/q/223634/98845

Some languages provide a method to get monetary values. This answer will use C++'s: get_money

So an input like this:

Jonathan Mee $1,234.56 987654321 true

Could use get_money right in a stream: cin >> a >> b >> get_money(c) >> d >> e; to assign the values:

  • string a: "Jonathan"
  • string b: "Mee"
  • long double c: 123456
  • int d: 987654321
  • bool e: true

How get_money is handled is based on the stream's locale and specifically that locale's moneypunct. There may already be compiler/OS supports for a locale that already treats money this way: https://msdn.microsoft.com/en-us/goglobal/bb896001.aspx

Don't fret if there isn't built in support. The facets of a locale are C++ features that support extensive customization, and to solve this problem only do_decimal_point and do_thousands_sep need to be overridden. There is an extensive write-up of how to do this here: https://stackoverflow.com/a/31390558/2642059 But for the purposes of this answer the punct_facet from that answer will just be wholesale ganked:

template <typename T>
class punct_facet : public T {
private:
    void Init(const T* money){
        const auto vTablePtrSize = sizeof(void*);

        memcpy(reinterpret_cast<char*>(this) + vTablePtrSize, reinterpret_cast<const char*>(money) + vTablePtrSize, sizeof(T) - vTablePtrSize);
    }
protected:
    typename T::char_type do_decimal_point() const {
        return typename T::char_type(',');
    }

    typename T::char_type do_thousands_sep() const {
        return typename T::char_type('.');
    }
public:
    punct_facet(){
        Init(&use_facet<T>(cout.getloc()));
    }

    punct_facet(const T* money){
        Init(money);
    }
};

Such an implementation would allow the use of the locale facet constructor like this:

locale foo("en-US");

cin.imdue(locale(foo, new punct_facet<moneypunct<char>>(&use_facet<moneypunct<char>>(foo))));

Which means an input like this:

Jonathan Mee $1.234,56 987654321 true

Can be read with the original command, cin >> a >> b >> get_money(c) >> d >> e;, to assign the values:

  • string a: "Jonathan"
  • string b: "Mee"
  • long double c: 123456
  • int d: 987654321
  • bool e: true

Even an untrained eye can see that the punct_facet class is more code than would be required to setup and use a regex a single time. C++'s moneypunct outshines regexes in code where it is used multiple times in ways that it can't be encapsulated into a single regex function. moneypunct also provides these clear advantages over a regex:

  1. Direct use in a stream
  2. Availability to greater specificity in input error handling
  3. Provision for further customization including: Negative format, positive format, currency symbol, and more
  4. Compatibility with put_money for streaming out currency
  5. Where performance considerations are applicable, get_money leverages far more specialized code than a regex is able too
Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288