2

Initial string can be in many different formats. Can contain (or not) dollar sign, can have words like 'bn' (meaning billion), thousand(s), million(s), billion(s), can have decimal points and thousand separators (commas). In addition this string can contain additional text not directly connected to a value.

I'd like this string to be converted to integer.

For example the string US$ 2,864.773 million in 2014 should be converted to integer number 2864733000.

If it possible to detect the currency name it would be just perfect!

Is there any out of the shelf solutions like PHP class or something?

Vlada Katlinskaya
  • 991
  • 1
  • 10
  • 26
  • IS there a way to isolate all cases and write a method for each?Or there are too many – Mihai Feb 12 '16 at 14:44
  • @Mihai as the text goes from ordinary text - the number of cases is virtually unlimited. I will do as you suggest for most cases if I will not find any better solutions. – Vlada Katlinskaya Feb 12 '16 at 14:47
  • 1
    http://stackoverflow.com/questions/5139793/php-unformat-money – Justin Pavatte Feb 12 '16 at 14:49
  • There's no built in function to do that... so you need to write your own :-) – Werner Feb 12 '16 at 14:53
  • Hi @VladaKatlinskaya, I think you are looking for a function you can throw a string containing **any format** in, whatever the format, something flexible like strtotime() ? eg: "23k dollar", "$23,000.00 dollar", "$ 23,000 dollar", "USD23000", "USD23,000", "23000USD", "23000 USD" etc. ? – Werner Feb 12 '16 at 15:00
  • _“the number of cases is virtually unlimited”_ – so basically you are asking for an AI … – CBroe Feb 12 '16 at 15:07
  • @CBroe May be the number of cases are unlimited, but there are definitely some rules. So AI would be not very complicated. – Vlada Katlinskaya Feb 12 '16 at 15:45
  • @Werner You are right! – Vlada Katlinskaya Feb 12 '16 at 15:47

1 Answers1

3

I would use a regular expression, something like:

(\w*\$)\s*([0-9,.]+)\s+(thousand|million|billion|bn)?

which will capture both the currency and value. PHP:

if (preg_match('/(\w*\$)\s*([0-9,.]+)\s+(thousand|million|billion|bn)?/i', $input, $matches)) {
    $currency = $matches[1];
    $value = str_replace(',', $matches[2]);
    $multiplier = null;
    if (isset($matches[3])) {
        $multiplier = $matches[3];
    }
}

Explaining the regex a bit:

(\w*\$) captures the currency / symbol

\s* allows for any whitespace between the currency and value

([0-9,.]+) captures the value

(thousand|million|billion|bn)? captures million/billion, etc. and the ? makes it optional.

Matt S
  • 14,976
  • 6
  • 57
  • 76
  • Thank you for your input! This working fairly. But some inputs parsed in a wrong way. You can take a look for my case: https://yadi.sk/i/PGW6MWXvojGGK. Second column (with a year) is another thing - just skip it. I have very weak skills in regular expressions. If you will be able to edit yours for better working I will be very appreciate! – Vlada Katlinskaya Feb 12 '16 at 14:59
  • @VladaKatlinskaya I made some changes, and also now capture the billion / million, etc. – Matt S Feb 12 '16 at 15:03
  • Hm. Look at the result: many inputs now skipped: https://yadi.sk/i/ynVm-8CxojHtZ Is there anything can be corrected for better working? Thank you for you help! – Vlada Katlinskaya Feb 12 '16 at 15:06
  • My last edit broke the currency. Please see the change and try again. – Matt S Feb 12 '16 at 15:09
  • Thank you , Matt! Much better! – Vlada Katlinskaya Feb 12 '16 at 15:43