0

Here's a weird one. I'm given an ill-conceived input string that is a list of simple strings OR JSON blobs, separated by commas. e.g.:

string input = "{<some JSON object>},Normal Text,Some-Other-String-Without-Commas,{JSON_3},...,{JSON_n}"

And I have to break this into two lists - a list of JSON strings, and a list of non-JSON strings.

The nice thing is that the non-JSON strings are known to contain no special characters (no commas, and no curly braces which might be mistaken as JSON). The not-nice thing is that the JSON blobs (all of which will start with { and end with }), will obviously contain plenty of commas.


The "obvious" solution (using String.Split):

List<string> split = input.Split(',').ToList();

would of course fail to escape commas present within the JSON objects ({}) themselves


I was considering a manual approach - walking the string character-by-character and only splitting out a new element if the count of { is equal to the count of }. Something like:

List<string> blobs = new List<string>();
int start = 0, nestingLevel = 0;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '{') nestingLevel++;
    else if (input[i] == '}') nestingLevel--;
    else if (input[i] == ',' && nestingLevel == 0)
    {
        blobs.Add(input.Substring(start, i - start));
        start = i + 1;
    }
}
// Trivial TODO: split blobs into JSON and non-JSON by checking if the first character is '{'

(Note: above definitely contains bugs)

This approach probably fails to handle a myriad of things that might appear in JSON. For example, The characters { and } may 'benignly' appear in JSON if they are escaped within a string (quotation marks) - but if I start counting quotation marks, I might encounter escaped quotation marks (\"), which should not be counted. But if I check for escape characters, I better make sure they-themselves are not escaped (\\) - what a nightmare. I would prefer not to end up writing a full-fledged JSON parser myself.


I had also considered adding JSON array braces on either end of the string ([]) and letting a JSON serializer deserialize it as a JSON array, then re-serialize each of the array elements one at a time:

List<string> JsonBlobs = Newtonsoft.Json.Linq.JArray.Parse("[" + input + "]").Select(t => t.ToString()).ToList();

The only problem with this is that any JSON deserializer I've encoutnered will "not" handle random non-JSON strings within a list of object.

My guess is that the ideal solution will need to be a hybrid between the above two solutions. The following monstrosity comes to mind:

List<string> blobs = new List<string>();
int start = 0;
bool in_json_land = false;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '{') in_json_land = true;
    else if (input[i] == '}' and in_json_land) {
        try {
            JToken blob = Newtonsoft.Json.Parse(input.Substring(start, i - start));
            blobs.Add(blob.ToString());
            start = i + 1;
        } catch { /* Must not have encountered the end of the JSON yet... */ }
    }
    else if (input[i] == ',' && !in_json_land)
    {
        blobs.Add(input.Substring(start, i - start));
        start = i + 1;
    }
}

Any better suggestions?

Alain
  • 26,663
  • 20
  • 114
  • 184
  • 1
    You've already asked the similar [question](https://stackoverflow.com/questions/60287376/split-a-list-of-json-blobs-delimited-by-commas-ignoring-commas-inside-a-json-bl) – Pavel Anikhouski Feb 18 '20 at 20:39
  • Build a simple parser? Walk the string, whenever you get a `[` or a `{` go into JSON mode and count how deep you are in the bracket nesting. When you get a `]` or `]` in JSON mode, decrement the count. When it gets to 0, you are back in CSV mode. – Flydog57 Feb 18 '20 at 20:43
  • 1
    regex match + replace the json, then split. https://stackoverflow.com/questions/32155133/regex-to-match-a-json-string – Jeff Feb 18 '20 at 20:45
  • @PavelAnikhouski Indeed, and according to meta.stackoverflow.com - it's preferable to ask a new question if your original question was insufficient to meet your needs, rather than edit it or request additional things after the fact. – Alain Feb 19 '20 at 02:49
  • @Flydog57 Your suggestion is a part of my question - including the flaws in it. – Alain Feb 19 '20 at 02:56

0 Answers0