I'm scraping some data. One of the data points is tournament prize pools. There are many different currencies in the data. I'd like to extract the amount and currency from each value, so that I can use Google to convert these to a base currency. However, it's been a while since I've used regular expressions, so I'm rusty to say the least. Possible formats of the data are as follows:
$534
$22,136.20
3,200,000 Ft HUF
12,500 kr DKK
50,000 kr SEK
$3,800 AUD
$10,000 NZD
€4,500 EUR
¥100,000 CNY
₹7,000,000 INR
R$39,000 BRL
Below is the first regular expression I came up with.
[0-9,.]+(.+)[A-Z]{3}
But that obviously doesn't capture the amount and currency, so I changed it.
([0-9,.]+).+([A-Z]{3})
However, there are issues with this regular expression that I can't figure out.
([0-9,.]+)
by itself works fine to capture just the amount.When I add
.+
to that expression, for some reason it stops capturing the trailing4
and0
in the first and second test cases respectively. Why?Then when I add
([A-Z]{3})
, it seems to work perfectly for all of the test cases, but obviously selects nothing in the first two.So I changed it to
([A-Z]{0,3})
, which seems to break everything.
What's happening? How can I change the expression so that it works?
This is where I'm at: ([0-9,.]+)((?:.+)([A-Z]{3}))?