9

I have a DataTable having two columns. ShipmentDate(DateTime) and Count(Int). after I deserialize the string I noticed the type of ShipmentDate becomes string if the first itemarray value is null.

Check the below example. both the json string have same data except the first array item.

string jsonTable1 = "[{\"ShipmentDate\":null,\"Count\":3},{\"ShipmentDate\":\"2015-05-13T00:00:00\",\"Count\":13},{\"ShipmentDate\":\"2015-05-19T00:00:00\",\"Count\":1},{\"ShipmentDate\":\"2015-05-26T00:00:00\",\"Count\":1},{\"ShipmentDate\":\"2015-05-28T00:00:00\",\"Count\":2}]";
string jsonTable2 = "[{\"ShipmentDate\":\"2015-05-13T00:00:00\",\"Count\":13},{\"ShipmentDate\":null,\"Count\":3},{\"ShipmentDate\":\"2015-05-19T00:00:00\",\"Count\":1},{\"ShipmentDate\":\"2015-05-26T00:00:00\",\"Count\":1},{\"ShipmentDate\":\"2015-05-28T00:00:00\",\"Count\":2}]";

DataTable tbl1 = Newtonsoft.Json.JsonConvert.DeserializeObject<DataTable>(jsonTable1);
DataTable tbl2 = Newtonsoft.Json.JsonConvert.DeserializeObject<DataTable>(jsonTable2);

Console.WriteLine(tbl1.Columns["ShipmentDate"].DataType);
Console.WriteLine(tbl2.Columns["ShipmentDate"].DataType);

In my scenario ShipmentDate of first item array can be null and it creates problem by converting it to string type.

I have a situation where schema of datatable is dynamic. i can't create strongly typed class.

Rashmin Javiya
  • 5,173
  • 3
  • 27
  • 49
  • Can you try using a strongly typed object to deserialize, instead of generic DataTable? – Krishna Chaithanya Muthyala May 09 '16 at 06:43
  • You could create first a class with specific types and deserialize the json string to these objects and afterwards converting to datatable. `Newtonsoft.Json.JsonConvert.DeserializeObject(jsonTable1);` and later convert to datatable like this example: http://stackoverflow.com/questions/17088779/fill-datatable-from-linq-query – Sebastian Siemens May 09 '16 at 06:50
  • @KrishnaChaithanyaMuthyala I forget to mentioned I have a situation where schema of datatable is dynamic. i can't create strongly typed class – Rashmin Javiya May 09 '16 at 07:03
  • @RashminJaviya, if it is dynamic, how can you say ShipmentDate is a datetime field? Probably you can say because you know, but how would runtime know? And when you say dynamic, you are not even aware of all the properties all the time? Or you know that there are 100 properties and out of those, only certain number of properties can be available at any point of time? – Krishna Chaithanya Muthyala May 09 '16 at 08:09
  • @KrishnaChaithanyaMuthyala These column have certain behaviour on UI depending upon the column type. Yes i am unaware of columntype but my point is it should not change. There must be something in Newton Json – Rashmin Javiya May 09 '16 at 08:34
  • @RashminJaviya, am doubtful. So, what you want the deserializer to do is, to try one round of parsing, then if it figures out that some instances of some property is null and some instances of same property is not null and of type datetime, do the final round of parsing and deserialize assuming the type as nullable datetime. Is this your expectation? – Krishna Chaithanya Muthyala May 09 '16 at 08:43
  • @RashminJaviya, Newtonsoft JSON JsonSerializer has a property called NullValueHandling which can be set to Ignore. At the time of serialization/deserialization, those properties will be ignored. This can stop the exception you are getting but will it help in further execution of your code? – Krishna Chaithanya Muthyala May 09 '16 at 09:04
  • @KrishnaChaithanyaMuthyala I already tried that. – Rashmin Javiya May 09 '16 at 09:07
  • @RashminJaviya, what's the problem using NullValueHandling? Is it still throwing exception or something else is failing in the code? Can you share what you are trying to do with this dataTables after deserialization? – Krishna Chaithanya Muthyala May 09 '16 at 09:27

2 Answers2

16

The basic problem here is that Json.NET's DataTableConverter infers each DataColumn.DataType by looking at token values present in the first row only. It works this way because it streams the JSON for the table in rather than loading the entirety into an intermediate JToken hierarchy. While streaming gives better performance with reduced memory use, it means that null values in the first row may result in incorrectly typed columns.

This is a problem that comes up from time to time on stackoverflow, for instance in the question deserialize a datatable with a missing first column. In that case, the questioner knew in advance that the column type should be double. In your case, you have stated schema of datatable is dynamic, so that answer cannot be used. However, as with that question, since Json.NET is open source under the MIT License, one can create a modified version of its DataTableConverter with the necessary logic.

As it turns out, it's possible to correctly set the column type while retaining the streaming behavior by remembering columns with ambiguous data types, then replacing those columns with correctly typed columns when a proper type can be determined:

/// <summary>
/// Converts a <see cref="DataTable"/> to and from JSON.
/// </summary>
public class TypeInferringDataTableConverter : Newtonsoft.Json.Converters.DataTableConverter
{
    // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Converters/DataTableConverter.cs
    // Original license: https://github.com/JamesNK/Newtonsoft.Json/blob/master/LICENSE.md

    /// <summary>
    /// Reads the JSON representation of the object.
    /// </summary>
    /// <param name="reader">The <see cref="JsonReader"/> to read from.</param>
    /// <param name="objectType">Type of the object.</param>
    /// <param name="existingValue">The existing value of object being read.</param>
    /// <param name="serializer">The calling serializer.</param>
    /// <returns>The object value.</returns>
    public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
    {
        if (reader.TokenType == JsonToken.Null)
        {
            return null;
        }

        DataTable dt = existingValue as DataTable;

        if (dt == null)
        {
            // handle typed datasets
            dt = (objectType == typeof(DataTable))
                ? new DataTable()
                : (DataTable)Activator.CreateInstance(objectType);
        }

        // DataTable is inside a DataSet
        // populate the name from the property name
        if (reader.TokenType == JsonToken.PropertyName)
        {
            dt.TableName = (string)reader.Value;

            reader.ReadAndAssert();

            if (reader.TokenType == JsonToken.Null)
            {
                return dt;
            }
        }

        if (reader.TokenType != JsonToken.StartArray)
        {
            throw JsonSerializationExceptionHelper.Create(reader, "Unexpected JSON token when reading DataTable. Expected StartArray, got {0}.".FormatWith(CultureInfo.InvariantCulture, reader.TokenType));
        }

        reader.ReadAndAssert();

        var ambiguousColumnTypes = new HashSet<string>();

        while (reader.TokenType != JsonToken.EndArray)
        {
            CreateRow(reader, dt, serializer, ambiguousColumnTypes);

            reader.ReadAndAssert();
        }

        return dt;
    }

    private static void CreateRow(JsonReader reader, DataTable dt, JsonSerializer serializer, HashSet<string> ambiguousColumnTypes)
    {
        DataRow dr = dt.NewRow();
        reader.ReadAndAssert();

        while (reader.TokenType == JsonToken.PropertyName)
        {
            string columnName = (string)reader.Value;

            reader.ReadAndAssert();

            DataColumn column = dt.Columns[columnName];
            if (column == null)
            {
                bool isAmbiguousType;

                Type columnType = GetColumnDataType(reader, out isAmbiguousType);
                column = new DataColumn(columnName, columnType);
                dt.Columns.Add(column);

                if (isAmbiguousType)
                    ambiguousColumnTypes.Add(columnName);
            }
            else if (ambiguousColumnTypes.Contains(columnName))
            {
                bool isAmbiguousType;
                Type newColumnType = GetColumnDataType(reader, out isAmbiguousType);
                if (!isAmbiguousType)
                    ambiguousColumnTypes.Remove(columnName);
                if (newColumnType != column.DataType)
                {
                    column = ReplaceColumn(dt, column, newColumnType, serializer);
                }
            }

            if (column.DataType == typeof(DataTable))
            {
                if (reader.TokenType == JsonToken.StartArray)
                {
                    reader.ReadAndAssert();
                }

                DataTable nestedDt = new DataTable();

                var nestedUnknownColumnTypes = new HashSet<string>();

                while (reader.TokenType != JsonToken.EndArray)
                {
                    CreateRow(reader, nestedDt, serializer, nestedUnknownColumnTypes);

                    reader.ReadAndAssert();
                }

                dr[columnName] = nestedDt;
            }
            else if (column.DataType.IsArray && column.DataType != typeof(byte[]))
            {
                if (reader.TokenType == JsonToken.StartArray)
                {
                    reader.ReadAndAssert();
                }

                List<object> o = new List<object>();

                while (reader.TokenType != JsonToken.EndArray)
                {
                    o.Add(reader.Value);
                    reader.ReadAndAssert();
                }

                Array destinationArray = Array.CreateInstance(column.DataType.GetElementType(), o.Count);
                Array.Copy(o.ToArray(), destinationArray, o.Count);

                dr[columnName] = destinationArray;
            }
            else
            {
                object columnValue = (reader.Value != null)
                    ? serializer.Deserialize(reader, column.DataType) ?? DBNull.Value
                    : DBNull.Value;

                dr[columnName] = columnValue;
            }

            reader.ReadAndAssert();
        }

        dr.EndEdit();
        dt.Rows.Add(dr);
    }

    static object RemapValue(object oldValue, Type newType, JsonSerializer serializer)
    {
        if (oldValue == null)
            return null;
        if (oldValue == DBNull.Value)
            return oldValue;
        return JToken.FromObject(oldValue, serializer).ToObject(newType, serializer);
    }

    private static DataColumn ReplaceColumn(DataTable dt, DataColumn column, Type newColumnType, JsonSerializer serializer)
    {
        var newValues = Enumerable.Range(0, dt.Rows.Count).Select(i => dt.Rows[i]).Select(r => RemapValue(r[column], newColumnType, serializer)).ToList();

        var ordinal = column.Ordinal;
        var name = column.ColumnName;
        var @namespace = column.Namespace;

        var newColumn = new DataColumn(name, newColumnType);
        newColumn.Namespace = @namespace;
        dt.Columns.Remove(column);
        dt.Columns.Add(newColumn);
        newColumn.SetOrdinal(ordinal);

        for (int i = 0; i < dt.Rows.Count; i++)
            dt.Rows[i][newColumn] = newValues[i];

        return newColumn;
    }

    private static Type GetColumnDataType(JsonReader reader, out bool isAmbiguous)
    {
        JsonToken tokenType = reader.TokenType;

        switch (tokenType)
        {
            case JsonToken.Integer:
            case JsonToken.Boolean:
            case JsonToken.Float:
            case JsonToken.String:
            case JsonToken.Date:
            case JsonToken.Bytes:
                isAmbiguous = false;
                return reader.ValueType;
            case JsonToken.Null:
            case JsonToken.Undefined:
                isAmbiguous = true;
                return typeof(string);
            case JsonToken.StartArray:
                reader.ReadAndAssert();
                if (reader.TokenType == JsonToken.StartObject)
                {
                    isAmbiguous = false;
                    return typeof(DataTable); // nested datatable
                }
                else
                {
                    isAmbiguous = false;
                    bool innerAmbiguous;
                    // Handling ambiguity in array entries is not yet implemented because the first non-ambiguous entry in the array
                    // might occur anywhere in the sequence, requiring us to scan the entire array to determine the type, 
                    // e.g., given: [null, null, null, 314, null]
                    // we would need to scan until the 314 value, and do:
                    // return typeof(Nullable<>).MakeGenericType(new[] { reader.ValueType }).MakeArrayType();
                    Type arrayType = GetColumnDataType(reader, out innerAmbiguous);
                    return arrayType.MakeArrayType();
                }
            default:
                throw JsonSerializationExceptionHelper.Create(reader, "Unexpected JSON token when reading DataTable: {0}".FormatWith(CultureInfo.InvariantCulture, tokenType));
        }
    }
}

internal static class JsonSerializationExceptionHelper
{
    public static JsonSerializationException Create(this JsonReader reader, string format, params object[] args)
    {
        // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/JsonPosition.cs

        var lineInfo = reader as IJsonLineInfo;
        var path = (reader == null ? null : reader.Path);
        var message = string.Format(CultureInfo.InvariantCulture, format, args);
        if (!message.EndsWith(Environment.NewLine, StringComparison.Ordinal))
        {
            message = message.Trim();
            if (!message.EndsWith(".", StringComparison.Ordinal))
                message += ".";
            message += " ";
        }
        message += string.Format(CultureInfo.InvariantCulture, "Path '{0}'", path);
        if (lineInfo != null && lineInfo.HasLineInfo())
            message += string.Format(CultureInfo.InvariantCulture, ", line {0}, position {1}", lineInfo.LineNumber, lineInfo.LinePosition);
        message += ".";

        return new JsonSerializationException(message);
    }
}

internal static class StringUtils
{
    // Adapted from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Utilities/StringUtils.cs
    public static string FormatWith(this string format, IFormatProvider provider, object arg0)
    {
        return format.FormatWith(provider, new[] { arg0 });
    }

    private static string FormatWith(this string format, IFormatProvider provider, params object[] args)
    {
        return string.Format(provider, format, args);
    }
}

internal static class JsonReaderExtensions
{
    public static void ReadAndAssert(this JsonReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException("reader");
        if (!reader.Read())
        {
            throw JsonSerializationExceptionHelper.Create(reader, "Unexpected end when reading JSON.");
        }
    }
}

Then use it like:

var settings = new JsonSerializerSettings { Converters = new[] { new TypeInferringDataTableConverter() } };

DataTable tbl1 = Newtonsoft.Json.JsonConvert.DeserializeObject<DataTable>(jsonTable1, settings);
DataTable tbl2 = Newtonsoft.Json.JsonConvert.DeserializeObject<DataTable>(jsonTable2, settings);

Do not set NullValueHandling = NullValueHandling.Ignore since nulls are now handled correctly.

Prototype fiddle

Note that, while this class handles re-typing of columns with null values, it doesn't handle re-typing of columns containing array values where the first array item is null. For instance, if the first row of some column has the value

[null, null, null, 314, null]

Then the column type inferred would ideally be typeof( long? [] ), however that is not being implemented here. Likely it would be necessary to completely load the JSON into a JToken hierarchy to make that determination.

dbc
  • 104,963
  • 20
  • 228
  • 340
0

My question 64647406 links to here.

I found that I can use a strongly-typed DataTable derivative, and pre-add any columns which I know the DataType of. These will be untouched by [FromBody] and simply used as-is. As long as the Column Name is correct, they can be added starting at index 0 - there is no need to add all columns; just the ones you need absolute control over.

This means that NULL values in the first row won't create string columns for those values.

radders
  • 923
  • 8
  • 29