-1

Note: The ask is on creating a stream of numbers from a json string, preferably using regex. Please do not consider the question as a duplicate of something else on regex. My main objective is to read a stream of numbers. Using regex for the purpose is optional.

My objective is to create an array of decimal numbers read from a large json string that looks like the example shown below.

Example input:

 {
    "item1": 102.119,
    "item2": "unknown",
    "item3": 200.12,
    "item4": 1.08,
    "item5": 0.04
 }

Expected output:

My Data = [102.119, 200.12, 1.08, 0.04]

What I tried:

import java.util.ArrayList;

public class MyData { 
    public static void main(String[] args) {
        String input =  ("{\"item1\": 102.119," 
                          + "\"item2\": \"unknown\"," 
                          + "\"item3\": 200.12,"
                          + "\"item4\": 1.08,"
                          + "\"item5\": 0.04}"
                        );
        ArrayList<Double> output = new ArrayList<>();
        for(String s1 : input.split(",")) {
            for(String s2: s1.split(":")) {
                try {
                    output.add(Double.parseDouble(s2));
                } catch(Exception e) {
                    System.out.println(e.toString() + " : " + s2 + " is not a number!");
                }               
            }
        }
        System.out.println("\nMy Data = " + output.toString());
    }
}

Actual output:

$ javac MyData.java 

$ java MyData
java.lang.NumberFormatException: For input string: "{"item1"" : {"item1" is not a number!
java.lang.NumberFormatException: For input string: ""item2"" : "item2" is not a number!
java.lang.NumberFormatException: For input string: ""unknown"" :  "unknown" is not a number!
java.lang.NumberFormatException: For input string: ""item3"" : "item3" is not a number!
java.lang.NumberFormatException: For input string: ""item4"" : "item4" is not a number!
java.lang.NumberFormatException: For input string: ""item5"" : "item5" is not a number!
java.lang.NumberFormatException: For input string: "0.04}" :  0.04} is not a number!

My Data = [102.119, 200.12, 1.08]

Problems:

  1. The final number 0.04 is not read into the data array.
  2. The two for loops are taking long time to execute on a large input set.

Help required:

Can you please help improve the code to read all the decimal numbers, and in a more efficient way?

Can you let me know if regular expression pattern matching can be used for this scenario?

Gopinath
  • 4,066
  • 1
  • 14
  • 16
  • I do not understand why the question was considered as a duplicate of regexp. I wanted to read numbers as a stream from json, only preferably using regexp. I am now not able to find an answer due to this question being closed as a duplicate. Can somebody help get the question reopened, so that I can get the answer? – Gopinath Mar 08 '20 at 21:00

3 Answers3

2

With both String-splitting and RegEx there are some nasty edge-cases. If you're not in full control of the JSON I would not recommend to try that.

Preferrably use an existing library like org.json or GSON to parse the JSON. Both provide an object structure that you can loop and extract similarly to your current solution.

https://devqa.io/how-to-parse-json-in-java/

Roland Kreuzer
  • 912
  • 4
  • 11
  • Hi Roland, Thank you for sharing the link to parsing json. I will try it. – Gopinath Mar 01 '20 at 16:16
  • 1
    Don’t forget there is also an [official JSON library](https://docs.oracle.com/javaee/7/api/javax/json/package-summary.html) for Java, available on [the JSR 353 download page](https://jcp.org/aboutJava/communityprocess/final/jsr353/index.html). – VGR Mar 01 '20 at 17:05
0

Try with this regular expression:

/: (\d+(?:\.\d+))/
Illya
  • 1,268
  • 1
  • 5
  • 16
  • Hi Illya, Thank you. I tried the given regex - /: (\d+(?\.\d+))/ using the https://www.regexpal.com/ website, but the regex is not matching the decimal numbers. The regex pattern of ([\d]+).([\d]+) has matched the numbers. I will try this. – Gopinath Mar 01 '20 at 16:15
  • 1
    Ah, sorry, I forgot the `:`, this `: (\d+(?:\.\d+))/` will match decimal number – Illya Mar 01 '20 at 16:23
0

Maybe this can help you?

/: (\d+\.\d+)/
chuck
  • 1,420
  • 4
  • 19