3

I have a scenario where I need to validate whether the partial input(see below) is valid JSON or not? I have referred this answer to identify whether the given string is a valid JSON or not.

Example input:

 { 
 "JSON": [{
      "foo":"bar",
      "details": {
           "name":"bar",
           "id":"bar",

What I have tried so far:

/ (?(DEFINE)
         (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
         (?<boolean>   true | false | null )
         (?<string>    " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
         (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \]{0,1} )
         (?<pair>      \s* (?&string) \s* : (?&json)  )
         (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \}{0,1} )
         (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
) \A (?&json)\,{0,1} \Z /six

I made the closing of the array and objects optional(allow zero or one time). But there are some cases where this will fail, for example when you open a object without closing another object(shown below) the regex will still find a match.

Invalid, but still matches:

 { 
 "JSON": [{
      "foo":"bar",
      "details": {
           "name":"bar",
           "id":"bar",{

How to validate the partial JSON input?

EDIT:

As mentioned by @ntahdh in the comments, this regex won't work using the java.util.regex. So now I need a regex which should work without recursion

kowsikbabu
  • 499
  • 1
  • 6
  • 23
  • What's your core business use case and requirements that you're trying to solve this for? – entpnerd Feb 21 '20 at 04:58
  • I'm just trying to read and parse multiple JSON objects from an input stream, I cannot afford to read it completely and parse as the stream size would be unknown. – kowsikbabu Feb 21 '20 at 05:03
  • What if the first part contains valid JSON but the part after contains broken JSON? Is that a use case that you care about? – entpnerd Feb 21 '20 at 06:36
  • It is guaranteed that if the fist part is JSON, then remaining will be of valid JSON objects – kowsikbabu Feb 21 '20 at 06:41
  • What language are you writing this in? Does the solution have to be a regex (which, as Booboo correctly notes, is generally not possible unless you know quite a lot about the form of the data you're expecting), or just not require reading the whole string? – Ryan M Feb 21 '20 at 10:37
  • I'll be using PCRE in Java, I'm open to other possible ways that can be used in JAVA – kowsikbabu Feb 21 '20 at 11:33
  • How do you define what "valid" means in your context? For the part you showed you could at most say: "ok, it could be valid, depending on what comes next". Can you at least ensure, that you will always start parsing from the beginning of the JSON document or is it a continuous stream, where you have to determine start start of ne new message or document or something? – Jan Held Feb 21 '20 at 14:27
  • Let's just say I'm reading the first 10 lines of a stream, and if it's all valid JSON structure until then, then it will be guaranteed the rest is valid – kowsikbabu Feb 21 '20 at 15:16
  • And yeah it is the start of the message – kowsikbabu Feb 21 '20 at 15:18
  • take a look at https://jsonlint.com/. they shre their source code on github – Barka Feb 22 '20 at 11:03
  • 1
    I'm not sure if the Java interface of PCRE library you use supports this or not, but [pcrepartial](https://www.pcre.org/original/doc/html/pcrepartial.html) in the original PCRE library can be used here. It will checks whether the end of the string has been reached (and therefore the partial string so far is valid). You can use either hard or soft partial match - doesn't matter here since we always assume the data is incomplete. – nhahtdh Feb 24 '20 at 07:21
  • @nhahtdh this seems promising as there is a java alternative [hitEnd()](https://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#hitEnd()), I will definitely try this now! – kowsikbabu Feb 24 '20 at 08:21
  • @kowsikbabu: I hope you are not using java.regex.util since it doesn't support recursive regex, which is necessary to match parentheses. If you are really using PCRE engine and you want to use pcrepartial, you should rewrite your regex to make it strictly match JSON (instead of optional closing bracket like in your current regex) – nhahtdh Feb 24 '20 at 08:44
  • @nhahtdh you're right, `java.regex.util` doesn't support recursive regex! I don't know how to proceed from here now – kowsikbabu Feb 25 '20 at 10:01

3 Answers3

6

This is not quite an answer to you question and would have been if the form of a comment if the number of characters allowed for that were adequate.

JSON is not a regular language and cannot therefore be recognized solely by a regular expression engine (if you are programming in Python, the regex package provides extensions that might make it possible to accomplish your task, but what I said is generally true).

If a parser generator is not available for your preferred language, you might consider creating a simple recursive descent parser. The regular expressions you have already defined will serve you well for creating the tokens that will be the input to that parser. Of course, you will expect that a parsing error will occur -- but it should occur on the input token being the end-of-file token. A parsing error that occurs before the end-of-file token has been scanned suggests you do not have a prefix of valid JSON. If you are working with a bottom-up, shift-reduce parser such as one generated with YACC, then this would be a shift error on something other than the end-of-file token.

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • The question has been tagged with PCRE, which has enough power to do parentheses matching. Not that it's a good idea to try to accomplish this with a regex, though. – nhahtdh Feb 24 '20 at 07:03
  • 1
    @nhahtdh : PCRE gives a language specification, but there are dozens of flavors... Recursive patterns exist, that's true, but (?R) or balancing groups are not supported by all PCRE flavors. – David Amar Feb 25 '20 at 08:21
  • @DavidAmar Is PCRE a language specification? I find the documents less formal compared to an actual specification, like in JavaScript, which describes in detail how matching works. PCRE is just one regex library (among many) which implements the regex syntax in Perl, though PCRE particularly attempted to be compatible with Perl while other library only implements a subset of the regex syntax in Perl. – nhahtdh Feb 26 '20 at 07:05
1

why not let a parser like Gson do it for you, you basically deal with a stream and at a token level.

import java.io.IOException;
import java.io.StringReader;

import com.google.gson.stream.JsonReader;
import com.google.gson.stream.JsonToken;

public class Main 
{
    public static void main(String[] args) throws Exception 
    {

        String json = "{'id': 1001,'firstName': 'Lokesh','lastName': 'Gupta','email': null}";

        JsonReader jsonReader = new JsonReader(new StringReader(json));
        jsonReader.setLenient(true);

        try
        {
            while (jsonReader.hasNext()) 
            {
                JsonToken nextToken = jsonReader.peek();

                if (JsonToken.BEGIN_OBJECT.equals(nextToken)) {

                    jsonReader.beginObject();

                } else if (JsonToken.NAME.equals(nextToken)) {

                    String name = jsonReader.nextName();
                    System.out.println("Token KEY >>>> " + name);

                } else if (JsonToken.STRING.equals(nextToken)) {

                    String value = jsonReader.nextString();
                    System.out.println("Token Value >>>> " + value);

                } else if (JsonToken.NUMBER.equals(nextToken)) {

                    long value = jsonReader.nextLong();
                    System.out.println("Token Value >>>> " + value);

                } else if (JsonToken.NULL.equals(nextToken)) {

                    jsonReader.nextNull();
                    System.out.println("Token Value >>>> null");

                } else if (JsonToken.END_OBJECT.equals(nextToken)) {

                    jsonReader.endObject();

                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            jsonReader.close();
        }
    }
}

source: https://howtodoinjava.com/gson/jsonreader-streaming-json-parser/

BenEleven
  • 57
  • 6
  • OP''s looking for a way to validate a partial JSON. Gson validates a complete JSON. It can't be used, then. – Abrikot Feb 25 '20 at 14:57
  • It is not just one object, it's a stream of multiple objects, and it is not guaranteed to get the whole object for this validation process – kowsikbabu Feb 25 '20 at 15:01
1

I know that using regex to validate some strings with nested structures is not easy, if not unfeasible at all. You will probably have more chance using an existing JSON parser.

Use a stack to keep track of still opened objects and arrays. Add required closing curly and square brackets. Ask to the JSON parser if your new string is a valid JSON.

You will probably have to do some work to handle commas and quotes too, but you get the idea.

With a code sample:

import com.google.gson.JsonParser;
import com.google.gson.JsonSyntaxException;

import java.util.Stack;

public class Main {
    public static void main(String[] args) {
        String valid = "{\n" +
                       "\"JSON\": [{\n" +
                       "    \"foo\":\"bar\",\n" +
                       "    \"details\": {\n" +
                       "         \"name\":\"bar\",\n" +
                       "         \"id\":\"bar\"";
        System.out.println("Is valid?:\n" + valid + "\n" + validate(valid));

        String invalid = "{ \n" +
                         " \"JSON\": [{\n" +
                         "      \"foo\":\"bar\",\n" +
                         "      \"details\": {\n" +
                         "           \"name\":\"bar\",\n" +
                         "           \"id\":\"bar\",{";
        System.out.println("Is valid?:\n" + invalid + "\n" + validate(invalid));
    }

    public static boolean validate(String input) {
        Stack<String> closings = new Stack<>();

        for (char ch: input.toCharArray()) {
            switch(ch) {
                case '{':
                    closings.push("}");
                    break;
                case '[':
                    closings.push("]");
                    break;
                case '}':
                case ']':
                    closings.pop();
            }
        }

        StringBuilder closingBuilder = new StringBuilder();
        while (! closings.empty()) {
            closingBuilder.append(closings.pop());
        }
        String fullInput = input + closingBuilder.toString();
        JsonParser parser = new JsonParser();
        try{
            parser.parse(fullInput);
        }
        catch(JsonSyntaxException jse){
            return false;
        }
        return true;
    }
}

Which results in:

Is valid?:
{
"JSON": [{
    "foo":"bar",
    "details": {
         "name":"bar",
         "id":"bar"
true
Is valid?:
{ 
 "JSON": [{
      "foo":"bar",
      "details": {
           "name":"bar",
           "id":"bar",{
false

Note that adding a comma after the "bar" line in the valid example make it invalid (because "bar",}]}} is an invalid JSON).

AlexisBRENON
  • 2,921
  • 2
  • 18
  • 30