2

I receive some not really ISO conform Json content from an api. The boolean values are uppercase instead of lower case.

{ "Bool": False }

Initially, I thought that should be easy to solve by using a custom JsonConverter like shown in how to get newtonsoft to deserialize yes and no to boolean.

But it looks like the JsonConverter.ReadJson method is never called. I think the reason is, that the value False is not in quotes and thus JsonTextReader never calls the converter and creates the exception.

What would be the best way to handle that scenario?

public class BoolTests
{
    public class A
    {
        [JsonConverter(typeof(CaseIgnoringBooleanConverter))]
        public bool Bool { get; set; }
    }


    [Theory]
    [InlineData(false, "{'Bool': false}")] //ok
    [InlineData(false, "{'Bool': 'False'}")] // ok
    [InlineData(false, "{'Bool': False")] // fails
    public void CasingMatters(bool expected, string json)
    {
        var actual = JsonConvert.DeserializeObject<A>(json);
        Assert.Equal(expected, actual.Bool);
    }
}

// taken from https://gist.github.com/randyburden/5924981
public class CaseIgnoringBooleanConverter : JsonConverter
{
    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        switch (reader.Value.ToString().ToUpperInvariant().Trim())
        {
            case "TRUE":
                return true;
            case "FALSE":
                return false;
        }

        // If we reach here, we're pretty much going to throw an error so let's let Json.NET throw it's pretty-fied error message.
        return new JsonSerializer().Deserialize(reader, objectType);
    }

    public override bool CanConvert(Type objectType)
    {
        return objectType == typeof(bool);
    }

    public override bool CanWrite => false;

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        throw new NotImplementedException();
    }
}
delixfe
  • 2,471
  • 1
  • 21
  • 35
  • 2
    You could read file, pre-process it (changing `False` to `"False"`) and then feed valid json to `Deserialize()` method. – Sinatr Aug 23 '19 at 09:52
  • 3
    `False` or `True` as literals are not valid in JSON. Basically, what you have is invalid JSON. You can use custom converters and name handling to fit *valid json* to your type hierarchy, but if the input isn't really valid JSON to begin with you're out of luck with most deserializers. You must do something to the input first to make it valid before you can parse it. Essentially you will have to transform the `False` literals into a string or to `false`. Additionally, complain to the origin of that data. Invalid json should be fixed at the source, not circumvented at the receiving end. – Lasse V. Karlsen Aug 23 '19 at 10:03
  • @LasseVågsætherKarlsen As stated in the first line of the question, I am aware that the JSON is not valid. Maybe you have an idea on how to better formulate that ... – delixfe Aug 26 '19 at 08:38
  • I did understand that but I thought that I should clarify that "invalid json" is not in-scope for a json (de)serializer to handle. You have to make it valid before it can be parsed, find a less strict parser, or implement something yourself. I don't think writing a reformatter that doesn't deserialize, just understands the basic building blocks of json, that would fix this, would be too hard. – Lasse V. Karlsen Aug 26 '19 at 08:50
  • ... however, if you need a streaming deserializer to be able to handle this you need to process this during the streaming, that's going to be a bit harder. – Lasse V. Karlsen Aug 26 '19 at 08:56
  • Here's a (probably horribly inefficient) attempt at fixing it during streaming - https://gist.github.com/lassevk/1623378bd81944fc1a338485a83c584b - note the exception in the numeric portion, I did not bother with this as the attempt is either not performing well enough or might not be what you're after. The Json file it reads is this: `{ "false1": false, "false2": "false", "false3": False }` – Lasse V. Karlsen Aug 26 '19 at 09:32
  • @LasseVågsætherKarlsen Would you summarize your comments and your TextReader in an answer? I would happily accept that. – delixfe Aug 26 '19 at 09:50
  • OK, I've done that, please read the notes carefully! – Lasse V. Karlsen Aug 26 '19 at 10:13

2 Answers2

2

As said Lasse :

Invalid json should be fixed at the source.

If you really need to parse it as it is, you could replace False by "False" (as suggested by @Sinatr) if you want it as a string or false if you want it as a bool.

// If you want a string
json.Replace("False", "\"False\"");

// If you want a bool
json.Replace("False", "false");

One problem would be if a key or another value contains the "False" pattern.

  • Sometimes, you have no control over the source. It is an old problem and was worse with XML an SOAP. I had hoped that there was a cleaner workaround than preprocessing the source. Please be aware, that your solution will cause problems if you have bigger JSON payloads. I will not yet accept the answer. Maybe someone already implemented a stream based preprocessor or has a better solution. – delixfe Aug 26 '19 at 08:43
  • Please explain exactly what you mean by problems with bigger JSON payloads. I can see two challenges with this approach: 1) it requires the data to be loaded into memory all at once, and 2) it may change "False" inside strings which may not be what you want (but also might not be a problem depending on your data). Also, see my attempt at a streaming fix in the comments on the question. – Lasse V. Karlsen Aug 26 '19 at 09:34
  • I was more concerned with loading the data in the memory and then needing the same memory again for the replace operation. See https://stackoverflow.com/questions/16539584/string-replace-vs-stringbuilder-replace-for-memory – delixfe Aug 26 '19 at 09:55
2

Unfortunately, as you've discovered, invalid json is invalid, and thus not handled by normal and common json (de)serializers, such as Json.net.

Using converters and strategy settings for the deserializers will not work either as they're meant to handle things like empty-objects-returned-as-arrays or name conversion/case handling.

One naive solution would be to do a simple string replace, like

string json = invalidJson.Replace("False", "false");

This, however, has some problems:

  1. You need to read the entire invalid json into memory, as well as create a fixed copy of it, which means you will have two entire copies of the data in memory, one bad and one better.
  2. It would replace False inside strings as well. This may not be a problem with your data but wouldn't be easy to handle with the above approach.

A different approach would be to write a basic tokenizer that understands rudimentary JSON syntax, such as strings and numbers and identifiers, and go through the file token by token, replacing the bad identifiers. This would fix problem 2, but depending on the solution might need a more complex implementation to fix problem 1 with the memory.

A simple attempt at creating a TextReader that can be used, that will fix identifiers as they're found and otherwise understands rudimentary JSON tokens is posted below.

Note the following:

  1. It is not really performant. It allocates temporary buffers all the time. You might want to look into "buffer renting" to handle this approach somewhat better, or even just stream directly to the buffer.
  2. It doesn't handle numbers, because I stopped writing code at that point. I left this as an excercise. A basic number handling can be written because you're not really validating that the file is having valid JSON, so anything that will grab enough characters to constitute a number can be added.
  3. I did not test this with really big files, only with the small example file. I replicated a List<Test> with 9.5MB of text, and it works for that.
  4. I did not test all JSON syntax. There may be characters that should be handled but isn't. If you end up using this, create LOTS of tests!

What it does, however, is fix the invalid JSON according to the identifier(s) you've posted, and it does so in a streaming manner. This should thus be usable no matter how big a JSON file you have.

Anyway, here's the code, again note the exception regarding numbers:

void Main()
{
    using (var file = File.OpenText(@"d:\temp\test.json"))
    using (var fix = new MyFalseFixingTextReader(file))
    {
        var reader = new JsonTextReader(fix);
        var serializer = new JsonSerializer();
        serializer.Deserialize<Test>(reader).Dump();
    }
}

public class MyFalseFixingTextReader : TextReader
{
    private readonly TextReader _Reader;
    private readonly StringBuilder _Buffer = new StringBuilder(32768);

    public MyFalseFixingTextReader(TextReader reader) => _Reader = reader;

    public override void Close()
    {
        _Reader.Close();
        base.Close();
    }

    public override int Read(char[] buffer, int index, int count)
    {
        TryFillBuffer(count);

        int amountToCopy = Math.Min(_Buffer.Length, count);
        _Buffer.CopyTo(0, buffer, index, amountToCopy);
        _Buffer.Remove(0, amountToCopy);
        return amountToCopy;
    }

    private (bool more, char c) TryReadChar()
    {
        int i = _Reader.Read();
        if (i < 0)
            return (false, default);
        return (true, (char)i);
    }

    private (bool more, char c) TryPeekChar()
    {
        int i = _Reader.Peek();
        if (i < 0)
            return (false, default);
        return (true, (char)i);
    }

    private void TryFillBuffer(int count)
    {
        if (_Buffer.Length >= count)
            return;

        while (_Buffer.Length < count)
        {
            var (more, c) = TryPeekChar();
            if (!more)
                break;
            switch (c)
            {
                case '{':
                case '}':
                case '[':
                case ']':
                case '\r':
                case '\n':
                case ' ':
                case '\t':
                case ':':
                case ',':
                    _Reader.Read();
                    _Buffer.Append(c);
                    break;

                case '"':
                    _Buffer.Append(GrabString());
                    break;

                case char letter when char.IsLetter(letter):
                    var identifier = GrabIdentifier();
                    _Buffer.Append(ReplaceFaultyIdentifiers(identifier));
                    break;

                case char startOfNumber when startOfNumber == '-' || (startOfNumber >= '0' && startOfNumber <= '9'):
                    _Buffer.Append(GrabNumber());
                    break;

                default:
                    throw new InvalidOperationException($"Unable to cope with character '{c}' (0x{((int)c).ToString("x2")})");
            }
        }
    }

    private string ReplaceFaultyIdentifiers(string identifier)
    {
        switch (identifier)
        {
            case "False":
                return "false";

            case "True":
                return "true";

            case "Null":
                return "null";

            default:
                return identifier;
        }
    }

    private string GrabNumber()
    {
        throw new NotImplementedException("Left as an excercise");
        // See https://www.json.org/ for the syntax
    }

    private string GrabIdentifier()
    {
        var result = new StringBuilder();
        while (true)
        {
            int i = _Reader.Peek();
            if (i < 0)
                break;

            char c = (char)i;
            if (char.IsLetter(c))
            {
                _Reader.Read();
                result.Append(c);
            }
            else
                break;
        }
        return result.ToString();
    }

    private string GrabString()
    {
        _Reader.Read();

        var result = new StringBuilder();
        result.Append('"');

        while (true)
        {
            var (more, c) = TryReadChar();
            if (!more)
                return result.ToString();

            switch (c)
            {
                case '"':
                    result.Append(c);
                    return result.ToString();

                case '\\':
                    result.Append(c);
                    (more, c) = TryReadChar();
                    if (!more)
                        return result.ToString();

                    switch (c)
                    {
                        case 'u':
                            result.Append(c);
                            for (int index = 1; index <= 4; index++)
                            {
                                (more, c) = TryReadChar();
                                if (!more)
                                    return result.ToString();
                                result.Append(c);
                            }
                            break;

                        default:
                            result.Append(c);
                            break;
                    }
                    break;

                default:
                    result.Append(c);
                    break;
            }
        }
    }
}

public class Test
{
    public bool False1 { get; set; }
    public bool False2 { get; set; }
    public bool False3 { get; set; }
}

Example file:

{
    "false1": false,
    "false2": "false",
    "false3": False
}

Output (from LINQPad):

Sample LINQPad output

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • Thanks a million. I think this answer will help the next person running into a similar issue a lot! – delixfe Aug 26 '19 at 11:15