Unfortunately, as you've discovered, invalid json is invalid, and thus not handled by normal and common json (de)serializers, such as Json.net.
Using converters and strategy settings for the deserializers will not work either as they're meant to handle things like empty-objects-returned-as-arrays or name conversion/case handling.
One naive solution would be to do a simple string replace, like
string json = invalidJson.Replace("False", "false");
This, however, has some problems:
- You need to read the entire invalid json into memory, as well as create a fixed copy of it, which means you will have two entire copies of the data in memory, one bad and one better.
- It would replace
False
inside strings as well. This may not be a problem with your data but wouldn't be easy to handle with the above approach.
A different approach would be to write a basic tokenizer that understands rudimentary JSON syntax, such as strings and numbers and identifiers, and go through the file token by token, replacing the bad identifiers. This would fix problem 2, but depending on the solution might need a more complex implementation to fix problem 1 with the memory.
A simple attempt at creating a TextReader
that can be used, that will fix identifiers as they're found and otherwise understands rudimentary JSON tokens is posted below.
Note the following:
- It is not really performant. It allocates temporary buffers all the time. You might want to look into "buffer renting" to handle this approach somewhat better, or even just stream directly to the buffer.
- It doesn't handle numbers, because I stopped writing code at that point. I left this as an excercise. A basic number handling can be written because you're not really validating that the file is having valid JSON, so anything that will grab enough characters to constitute a number can be added.
I did not test this with really big files, only with the small example file. I replicated a List<Test>
with 9.5MB of text, and it works for that.
- I did not test all JSON syntax. There may be characters that should be handled but isn't. If you end up using this, create LOTS of tests!
What it does, however, is fix the invalid JSON according to the identifier(s) you've posted, and it does so in a streaming manner. This should thus be usable no matter how big a JSON file you have.
Anyway, here's the code, again note the exception regarding numbers:
void Main()
{
using (var file = File.OpenText(@"d:\temp\test.json"))
using (var fix = new MyFalseFixingTextReader(file))
{
var reader = new JsonTextReader(fix);
var serializer = new JsonSerializer();
serializer.Deserialize<Test>(reader).Dump();
}
}
public class MyFalseFixingTextReader : TextReader
{
private readonly TextReader _Reader;
private readonly StringBuilder _Buffer = new StringBuilder(32768);
public MyFalseFixingTextReader(TextReader reader) => _Reader = reader;
public override void Close()
{
_Reader.Close();
base.Close();
}
public override int Read(char[] buffer, int index, int count)
{
TryFillBuffer(count);
int amountToCopy = Math.Min(_Buffer.Length, count);
_Buffer.CopyTo(0, buffer, index, amountToCopy);
_Buffer.Remove(0, amountToCopy);
return amountToCopy;
}
private (bool more, char c) TryReadChar()
{
int i = _Reader.Read();
if (i < 0)
return (false, default);
return (true, (char)i);
}
private (bool more, char c) TryPeekChar()
{
int i = _Reader.Peek();
if (i < 0)
return (false, default);
return (true, (char)i);
}
private void TryFillBuffer(int count)
{
if (_Buffer.Length >= count)
return;
while (_Buffer.Length < count)
{
var (more, c) = TryPeekChar();
if (!more)
break;
switch (c)
{
case '{':
case '}':
case '[':
case ']':
case '\r':
case '\n':
case ' ':
case '\t':
case ':':
case ',':
_Reader.Read();
_Buffer.Append(c);
break;
case '"':
_Buffer.Append(GrabString());
break;
case char letter when char.IsLetter(letter):
var identifier = GrabIdentifier();
_Buffer.Append(ReplaceFaultyIdentifiers(identifier));
break;
case char startOfNumber when startOfNumber == '-' || (startOfNumber >= '0' && startOfNumber <= '9'):
_Buffer.Append(GrabNumber());
break;
default:
throw new InvalidOperationException($"Unable to cope with character '{c}' (0x{((int)c).ToString("x2")})");
}
}
}
private string ReplaceFaultyIdentifiers(string identifier)
{
switch (identifier)
{
case "False":
return "false";
case "True":
return "true";
case "Null":
return "null";
default:
return identifier;
}
}
private string GrabNumber()
{
throw new NotImplementedException("Left as an excercise");
// See https://www.json.org/ for the syntax
}
private string GrabIdentifier()
{
var result = new StringBuilder();
while (true)
{
int i = _Reader.Peek();
if (i < 0)
break;
char c = (char)i;
if (char.IsLetter(c))
{
_Reader.Read();
result.Append(c);
}
else
break;
}
return result.ToString();
}
private string GrabString()
{
_Reader.Read();
var result = new StringBuilder();
result.Append('"');
while (true)
{
var (more, c) = TryReadChar();
if (!more)
return result.ToString();
switch (c)
{
case '"':
result.Append(c);
return result.ToString();
case '\\':
result.Append(c);
(more, c) = TryReadChar();
if (!more)
return result.ToString();
switch (c)
{
case 'u':
result.Append(c);
for (int index = 1; index <= 4; index++)
{
(more, c) = TryReadChar();
if (!more)
return result.ToString();
result.Append(c);
}
break;
default:
result.Append(c);
break;
}
break;
default:
result.Append(c);
break;
}
}
}
}
public class Test
{
public bool False1 { get; set; }
public bool False2 { get; set; }
public bool False3 { get; set; }
}
Example file:
{
"false1": false,
"false2": "false",
"false3": False
}
Output (from LINQPad):
