0

If I have two JSON files/expressions, how do I determine if the data-content of the JSONs is functionally equivalent?

I.e. the comparision should ignore syntax like:

  • All whitespace/newlines (outside of strings).
  • The ordering of object-members.
  • Equivalent unicode characters (e.g. \u0041 = A).

I recognize some kind of JSON parser is needed, but I don't know my options. My data is on SQL server, so I have directly available SQL JSON functions and SSIS script components (C#/.Net).

Alternatively: Is there a way to compute hash values for JSON data-content, and are there any standards for minimizing/uniqifying the JSON expression before calculating the hash.

Edit: The JSON schema is unknown and not fixed.

Martin Thøgersen
  • 1,538
  • 18
  • 33
  • Deserialize and compare? – Ňɏssa Pøngjǣrdenlarp Oct 01 '17 at 21:50
  • 1
    Also consider doing it on SQL side. – Michał Żołnieruk Oct 01 '17 at 21:58
  • @Plutonix Could you provide a minimal example? Does it require a class definition? – Martin Thøgersen Oct 01 '17 at 22:13
  • @MichałŻołnieruk How to do it on SQL? (I'm a SQL expert, but don't have an idea of this.) – Martin Thøgersen Oct 01 '17 at 22:14
  • There are thousands of examples here on deserializng JSON; it seems like nary a day goes by without one.. Yes, you will need to write a class and can provide the comparison code (probably via IEquatable). – Ňɏssa Pøngjǣrdenlarp Oct 01 '17 at 22:16
  • Use `JTokenEqualityComparer` from [tag:json.net]. See [How can I create a unique hashcode for a JObject?](https://stackoverflow.com/q/39507095/3744182). In fact this may just be a duplicate. Agree? – dbc Oct 01 '17 at 22:27
  • Are you sure you want to ignore *The ordering of ... **array-elements***? Array element order is significant; object property order is not. – dbc Oct 01 '17 at 22:31
  • @dbc: While you're right in that an array is inherently ordered, element order may or may not be relevant to the particular application. It'd be good to have an option to ignore it if that's what's needed. – Ben Thul Oct 01 '17 at 22:35
  • @BenThul - in that case one might need to create one's own `JToken` comparer from scratch. The comparer from [Compare two arbitrary JToken-s of the same structure](https://stackoverflow.com/q/33022993) could be a place to start. It would be necessary to change `JArrayComparer` to something that ignores order. – dbc Oct 01 '17 at 22:42
  • Deserialization will not work, as I can't specify a general class that covers all the json in question. The json schema can change/expand in ways I do not control. In fact, the purpose of comparing json expressions is to understand if they have the same structure. – Martin Thøgersen Oct 01 '17 at 22:46
  • @dbc you are correct, according to [json.org](http://json.org) "An *object* is an *unordered* set of name/value pairs." "An *array* is an *ordered* collection of values." I will edit the question. – Martin Thøgersen Oct 01 '17 at 23:00
  • 2
    @MartinThøgersen - in that case I think [tag:json.net] and `JTokenEqualityComparer` should meet your needs. – dbc Oct 01 '17 at 23:04
  • A duplicate is a question with the [same potential answers](https://meta.stackexchange.com/a/10844) as another question. As such duplicates are sometimes more useful than the original if they're easier to find via search. Having your question closed as a duplicate isn't intended to imply anything negative about your question. It only means that the answers in the linked question answered your question also, so you don't need more help. – dbc Oct 02 '17 at 00:00

1 Answers1

6

Based on the feedback in the comments, here is my final working example based on JTokenEqualityComparer. The method will be applied to the SQL server by using a SSIS script task or a SQL CLR.

using System;
using Newtonsoft.Json.Linq;

namespace JSON_Comparison_Test
{
    class Program
    {
        static void Main(string[] args)
        {

            String jsonString1 = "{\"key1\":\"ABC\",\"key2\":\"DEF\"}";
            String jsonString2 = "{ \"key2\":\"DEF\" , \r\n \t  \"key1\" : \"\u0041BC\" }";

            var obj1 = JToken.Parse(jsonString1);
            var obj2 = JToken.Parse(jsonString2);

            var comparer = new JTokenEqualityComparer();
            var hashCode1 = comparer.GetHashCode(obj1);
            var hashCode2 = comparer.GetHashCode(obj2);

            Console.WriteLine(hashCode1.ToString()); // -323033486
            Console.WriteLine(hashCode2.ToString()); // -323033486

            Console.WriteLine(comparer.Equals(obj1, obj2)); // True
        }
    }
}
Martin Thøgersen
  • 1,538
  • 18
  • 33