0

I know that the title is quite vague but I'm gonna explain what I'm looking for.
I have to analyze a lot of data from various sources and I'd like to check if a value can be interpreted as a numeric value. I know there are regex and all but, the thing is that I have a lot of number formatted in various way, depending on where they come from :

  • english number : 123,456.123
  • french number : 123 456,123
  • spanish number : 123.456,123
  • swiss number : 123'456.123

I'm looking for a library able to analyze those String and return either the numeric value or the corresponding pattern. It would save me a lot of time !

Thank's

Carvallegro
  • 1,241
  • 4
  • 16
  • 24
  • did you want only numbers? – Avinash Raj Jun 23 '14 at 10:32
  • I'd prefer the regex but if you know a library that can extract the number no matter what the format is, i could manage something – Carvallegro Jun 23 '14 at 10:34
  • Do the numbers always have a fractional or decimal part- that is, will the English numbers always have a period in them? – Scooter Jun 23 '14 at 10:38
  • No, there's no formalism. I can have 3,456 as well as 3,456.7 – Carvallegro Jun 23 '14 at 10:42
  • Do you know in advance which country they come from? If not it cannot be done unambiguously. As an example from http://stackoverflow.com/questions/5888987/determine-if-a-string-is-a-number-and-convert-in-java: Consider something like 12,345. If that came from an American, it means 12K plus change. If it came from a European, it means 12 plus fractions. The same problem is present with ., only the other way around – DavidPostill Jun 23 '14 at 10:44
  • So then, it seems that the comment below is correct, "123,456" would be either a French or English number but the value in either case would be different? – Scooter Jun 23 '14 at 10:45
  • You said `depending on where they come from` do you have this information from where it came? With this you can prepare a resource file with predefined formats and check each of then. – Jorge Campos Jun 23 '14 at 10:48
  • @Scooter yes, here in Brazil our number formats is 1.456,60 which is 1k 465 plus 60 fraction. An American format it would be wrong format because the comma would be after the dot. – Jorge Campos Jun 23 '14 at 10:52
  • After verification, it seems that we can determine where the information came from. Based on this, we know wich format we have to use. Since there's no library that do what I'm looking for, I'll have to do it by myself. Thank for the help anyway – Carvallegro Jun 23 '14 at 12:03

3 Answers3

4

Since there is no way to tell is the string "123,456" is a French notation for 123.456 (one-hundred and twenty-three point four, five, six) or the English notation for 123456 (one-hundred and twenty-three thousand, four-hundred and fifty six), there is no reliable way to do what you ask.

Jamie Cockburn
  • 7,379
  • 1
  • 24
  • 37
1

Use a set of regular expression to do that. for example For English numbers you can have something like:

(([0-9]+),[0-9]+){1,}((\.)[0-9]+)?    

This will find numbers like

  • 232,23,34.4
  • 34,3,43

but not :

  • 343
  • 343.343

I hope This helps.

nafas
  • 5,283
  • 3
  • 29
  • 57
1

I consider the below method will meet your needs :

   public Double getNumber( String str ) {
    str = str.replace("'", "").replace(",","").replace(" ","").trim();

    try {
        Double num = Double.parseDouble(str);
        return num;
    } catch( NumberFormatException nfe ) {
        nfe.printStackTrace();
        return 0.0;
    }
}
chandra
  • 41
  • 3
  • Like I said to @nafas, It'll help if I know that my number use ' or , or space to separate thousands. But if , is used to separate decimal, I'll have the number wrong – Carvallegro Jun 23 '14 at 11:48