Here's a weird one. I'm given an ill-conceived input string that is a list of simple strings OR JSON blobs, separated by commas. e.g.:
string input = "{<some JSON object>},Normal Text,Some-Other-String-Without-Commas,{JSON_3},...,{JSON_n}"
And I have to break this into two lists - a list of JSON strings, and a list of non-JSON strings.
The nice thing is that the non-JSON strings are known to contain no special characters (no commas, and no curly braces which might be mistaken as JSON). The not-nice thing is that the JSON blobs (all of which will start with {
and end with }
), will obviously contain plenty of commas.
The "obvious" solution (using String.Split
):
List<string> split = input.Split(',').ToList();
would of course fail to escape commas present within the JSON objects ({}
) themselves
I was considering a manual approach - walking the string character-by-character and only splitting out a new element if the count of {
is equal to the count of }
. Something like:
List<string> blobs = new List<string>();
int start = 0, nestingLevel = 0;
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '{') nestingLevel++;
else if (input[i] == '}') nestingLevel--;
else if (input[i] == ',' && nestingLevel == 0)
{
blobs.Add(input.Substring(start, i - start));
start = i + 1;
}
}
// Trivial TODO: split blobs into JSON and non-JSON by checking if the first character is '{'
(Note: above definitely contains bugs)
This approach probably fails to handle a myriad of things that might appear in JSON. For example, The characters {
and }
may 'benignly' appear in JSON if they are escaped within a string (quotation marks) - but if I start counting quotation marks, I might encounter escaped quotation marks (\"
), which should not be counted. But if I check for escape characters, I better make sure they-themselves are not escaped (\\
) - what a nightmare. I would prefer not to end up writing a full-fledged JSON parser myself.
I had also considered adding JSON array braces on either end of the string ([]
) and letting a JSON serializer deserialize it as a JSON array, then re-serialize each of the array elements one at a time:
List<string> JsonBlobs = Newtonsoft.Json.Linq.JArray.Parse("[" + input + "]").Select(t => t.ToString()).ToList();
The only problem with this is that any JSON deserializer I've encoutnered will "not" handle random non-JSON strings within a list of object.
My guess is that the ideal solution will need to be a hybrid between the above two solutions. The following monstrosity comes to mind:
List<string> blobs = new List<string>();
int start = 0;
bool in_json_land = false;
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '{') in_json_land = true;
else if (input[i] == '}' and in_json_land) {
try {
JToken blob = Newtonsoft.Json.Parse(input.Substring(start, i - start));
blobs.Add(blob.ToString());
start = i + 1;
} catch { /* Must not have encountered the end of the JSON yet... */ }
}
else if (input[i] == ',' && !in_json_land)
{
blobs.Add(input.Substring(start, i - start));
start = i + 1;
}
}
Any better suggestions?