Can you suggest a regular expression which supports American and European Number format
Eg: US - 999,999.99 UK - 999.999,99
Can you suggest a regular expression which supports American and European Number format
Eg: US - 999,999.99 UK - 999.999,99
A simple solution would be to match one OR the other:
(?:\d{1,3})(?:\.\d{3})?(?:\.\d{3})?(?:,\d+)?|(?:\d{1,3})(?:,\d{3})?(?:,\d{3})?(?:\.\d+)?
Matches 1-3 digits, optionally followed by full stop and three digits, optionally followed by full stop and three digits and optionally followed by a comma and any number of digits (based on your example you may want to limit to two). Then, if this doesn't match, try it with full stops and comma reversed.
Edit: Considering your comment in kbysiec's answer, you may want to check the start and the end -
(?:\s|^)(?:(?:\d{1,3})(?:\.\d{3})?(?:\.\d{3})?(?:,\d+)?|(?:\d{1,3})(?:,\d{3})?(?:,\d{3})?(?:\.\d+)?)(?:\s|$)
This ensures the number is preceded by a space OR the beginning of the line, and terminated by a space or the end of the line.
Regards
Edit 2:
(?:\s|^)(?:(?:\d{1,3})(?:\.\d{3})*(?:,\d+)?|(?:\d{1,3})(?:,\d{3})*(?:\.\d+)?)(?:\s|$)
Shorter and allows numbers to be of any length.
(?<=\s|^)(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d{1,3}(?:.\d{3})*(?:\,\d+)?)(?=\s|$)
should match everything correctly, also checking incorrect placements of dots/commas, as seen here
Throw in |\d*([,.]\d+)?
before the last lookahead to also match non-seperated numbers.
The idea is that you will have ,xxx.yy
, in which case there's a digit with a length of 1-3. This situation can repeat itself.
Still, this is no easy task for regex, and this regex is not very readable. Some other tools would be probably better for this.
First off, whenever someone asks for a regular expression my first response is to ask if that's the right tool for the job: https://softwareengineering.stackexchange.com/q/223634/98845
Some languages provide a method to get monetary values. This answer will use C++'s: get_money
So an input like this:
Jonathan Mee $1,234.56 987654321 true
Could use get_money
right in a stream: cin >> a >> b >> get_money(c) >> d >> e;
to assign the values:
string a
: "Jonathan"
string b
: "Mee"
long double c
: 123456
int d
: 987654321
bool e
: true
How get_money
is handled is based on the stream's locale
and specifically that locale
's moneypunct
. There may already be compiler/OS supports for a locale
that already treats money this way: https://msdn.microsoft.com/en-us/goglobal/bb896001.aspx
Don't fret if there isn't built in support. The facets of a locale
are C++ features that support extensive customization, and to solve this problem only do_decimal_point
and do_thousands_sep
need to be overridden. There is an extensive write-up of how to do this here: https://stackoverflow.com/a/31390558/2642059 But for the purposes of this answer the punct_facet
from that answer will just be wholesale ganked:
template <typename T>
class punct_facet : public T {
private:
void Init(const T* money){
const auto vTablePtrSize = sizeof(void*);
memcpy(reinterpret_cast<char*>(this) + vTablePtrSize, reinterpret_cast<const char*>(money) + vTablePtrSize, sizeof(T) - vTablePtrSize);
}
protected:
typename T::char_type do_decimal_point() const {
return typename T::char_type(',');
}
typename T::char_type do_thousands_sep() const {
return typename T::char_type('.');
}
public:
punct_facet(){
Init(&use_facet<T>(cout.getloc()));
}
punct_facet(const T* money){
Init(money);
}
};
Such an implementation would allow the use of the locale
facet constructor like this:
locale foo("en-US");
cin.imdue(locale(foo, new punct_facet<moneypunct<char>>(&use_facet<moneypunct<char>>(foo))));
Which means an input like this:
Jonathan Mee $1.234,56 987654321 true
Can be read with the original command, cin >> a >> b >> get_money(c) >> d >> e;
, to assign the values:
string a
: "Jonathan"
string b
: "Mee"
long double c
: 123456
int d
: 987654321
bool e
: true
Even an untrained eye can see that the punct_facet
class is more code than would be required to setup and use a regex a single time. C++'s moneypunct
outshines regexes in code where it is used multiple times in ways that it can't be encapsulated into a single regex function. moneypunct
also provides these clear advantages over a regex:
put_money
for streaming out currencyget_money
leverages far more specialized code than a regex is able too