Java function to parse all doubles from string

Question

I know this has been asked before¹ but responses don't seem to cover all corner cases.

I tried implementing the suggestion¹ with the test case

String("Doubles -1.0, 0, 1, 1.12345 and 2.50")

Which should return

[-1, 0, 1, 1.12345, 2.50]:

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Locale;
public class Main
{
    public static void main(String[] args) {
        String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
        System.out.println(string);
        ArrayList<Double> doubles = getDoublesFromString(string);
        System.out.println(doubles);
    }
    
    public static ArrayList<Double> getDoublesFromString(String string){
        Scanner parser = new Scanner(string);
        parser.useLocale(Locale.US);
        ArrayList<Double> doubles = new ArrayList<Double>();
        double currentDouble;
        while (parser.hasNext()){
            if(parser.hasNextDouble()){
                currentDouble = parser.nextDouble();
                doubles.add(currentDouble);
            }
            else {
                parser.next();
            }
        }
        parser.close();
        return doubles;
    }
}

Instead code above returns [1.12345, 2.5].

Did I implement it wrong? What's the fix for catching negative and 0's?

The problem are `,` in your string. By default the scanner will split the string on whitespace. Therefore, the first three doubles are read as `-1.0,`, `0,` and `1,`. The comma prevent those from being seen as double by the scanner. — Turamarth, May 31 '22 at 07:20
@Turamarth I didn't know that. Thanks a lot! I used comma on the test case on purpose as in some languages (such as portuguese) comma is the separator for doubles, and the purposed solution used `Locale.US` so I was trying to test it as well. It will be hard to build something "universal" using scanner then, I'll keep up with the regex solution provided by Tim — nluizsoliveira, May 31 '22 at 07:28

Tim Biegeleisen · Accepted Answer · 2022-05-31T07:27:20.773

5

I would use a regex find all approach here:

String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
List<String> nums = new ArrayList<>();

String pattern = "-?\\d+(?:\\.\\d+)?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);

while (m.find()) {
    nums.add(m.group());
}

System.out.println(nums);  // [-1.0, 0, 1, 1.12345, 2.50]

By the way, your question makes use of the String constructor, which is seldom used, but is interesting to see, especially for those of us who never use it.

Here is an explanation of the regex pattern:

-?            match an optional leading negative sign
\\d+          match a whole number
(?:\\.\\d+)?  match an optional decimal component

edited May 31 '22 at 07:27

answered May 31 '22 at 07:18

Tim Biegeleisen

502,043
27
286
360

That works! Thank you a lot. I try avoiding regex as they're hard to understand/test but as nothing else works I'll hapily try to understand what's going on and use it – nluizsoliveira May 31 '22 at 07:25
1

I have added a description of what the regex pattern is doing. This pattern is not so complicated to understand (I hope). I also generally agree with you that complexity should be avoided, but regex just happens to work really well in this case. – Tim Biegeleisen May 31 '22 at 07:27
1

Thank you very very much! I'll accept the answer as soon as stackoverflow allows me – nluizsoliveira May 31 '22 at 07:27
This is neat and short but probably doesn't support lots of edge cases. Run a debugger inside the `Scanner` class and you'll see how complex their float pattern is, that should tell you something about the actual complexity of matching doubles (I would not have expected it!). I think it's for supporting things like NaN, Infinity, the scientific notation and so on. That's why all in all, I think the best advice is not to reinvent the (complex) wheel and use the `Scanner` class, with delimiters. – Dici May 31 '22 at 07:49
Hey @TimBiegeleisen I cannot suggest editions but here's your solution returning a `List` function https://onlinegdb.com/tLKr3XfkY – nluizsoliveira May 31 '22 at 08:07

Dici · Answer 2 · 2022-05-31T07:52:11.480

For your specific example, adding this at the construction of the scanner is sufficient: parser.useDelimiter("\\s|,");

The problem in your code is that the tokens containing a comma are not recognized as valid doubles. What the code above does is configuring the scanner to consider not only blank characters but also commas as token delimiters, and therefore the comma will not be in the token anymore, hence it will be a valid double that will successfully be parsed.

I believe this is the most appropriate solution because matching all doubles is actually complex. Below, I have pasted the regex that Scanner uses to do that, see how complicated this really is. Compared to splitting the string and then using Double.parseDouble, this is pretty similar but involves less custom code, and more importantly no exception throwing, which is slow.

(([-+]?((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(\Q-\E((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?))|[-+]?0[xX][0-9a-fA-F].[0-9a-fA-F]+([pP][-+]?[0-9]+)?|(([-+]?(NaN|\QNaN\E|Infinity|\Q∞\E))|((NaN|\QNaN\E|Infinity|\Q∞\E))|(\Q-\E(NaN|\QNaN\E|Infinity|\Q∞\E)))

score 2 · Answer 3 · answered May 31 '22 at 07:28

2

First of all: I would use the regex solution, too… It's better and the following is just an alternative using split and replace/replaceAll while catching Exceptions:

public static void main(String[] args) {
    // input
    String s = "Doubles -1.0, 0, 1, 1.12345 and 2.50";
    // split by whitespace(s) (keep in mind the commas will stay)
    String[] parts = s.split("\\s+");
    // create a collection to store the Doubles
    List<Double> nums = new ArrayList<>();
    // stream the result of the split operation and
    Arrays.stream(parts).forEach(p -> {
        // try to…
        try {
            // replace all commas and parse the value
            nums.add(Double.parseDouble(p.replaceAll(",", "")));
        } catch (Exception e) {
            // which won't work for words like "Doubles", so print an error on those
            System.err.println("Could not parse \"" + p + "\"");
        }
    });
    // finally print all successfully parsed Double values
    nums.forEach(System.out::println);
}

Output:

Could not parse "Doubles"
Could not parse "and"
-1.0
0.0
1.0
1.12345
2.5

answered May 31 '22 at 07:28

deHaar

17,687
10
38
51

1

This might be faster than regex in some cases +1. – Tim Biegeleisen May 31 '22 at 07:29
1

@TimBiegeleisen Yes, you could even skip the `try`-`catch` then… But this example definitely contains words. – deHaar May 31 '22 at 07:31
1

Well, this still uses lots of regex so it's not like it's a regex-free solution ^^ All solutions here use some regex. I think configuring delimiters in the scanner is cleaner in this case, to be honest, compared to writing custom code. – Dici May 31 '22 at 07:46
1

Sure, it uses the `split` method which takes a regex… But it does not explicitly use a complex regex with pattern and matcher. In general, you are right @Dici – deHaar May 31 '22 at 07:49
1

Yeah I think your solution works better than the currently accepted because it uses Java's built-in double parsing, so it will cover more cases (like scientific notation) – Dici May 31 '22 at 07:53
1

Ah interesting, I thought `Double.parseDouble` used regex, but it's only in the hex string code path, all others do not. Check the code of the method, it's so complicated! That should deter anyone to try to do it themselves haha. – Dici May 31 '22 at 07:58
Maybe, @user16320675 but this code does a `replaceAll` for each single part that does not contain any whitespace anymore. So it replaces commas from `String`s like `"-1.0,"` and so on. – deHaar May 31 '22 at 08:33
Not really tested, just tried the input given by OP and the result was as I posted. But that `split` in the beginning does – on purpose – only strip out all whitespaces and the commas get replaced in the `forEach` expecting to remove a single comma, could have used `replace` there as well. – deHaar May 31 '22 at 08:34

Java function to parse all doubles from string

3 Answers3