9

I am converting a nested JSON object with more than 10 levels to CSV file in C# .NET.

I have been using JavaScriptSerializer().Deserialize<ObjectA>(json) or XmlNode xml = (XmlDocument)JsonConvert.DeserializeXmlNode(json) to break down the object. With the objects I can further write into CSV file. However now the JSON object further expand. Most data that is not really in use so I would prefer a raw data dump.

Is that easier way I can just dump the data into csv format without declaring the structure?

Sample JSON

{
"F1":1,
"F2":2,
"F3":[
    {
        "E1":3,
        "E2":4
    },
    {
        "E1":5,
        "E2":6
    },  
    {
        "E1":7,
        "E2":8,
        "E3":[
            {
                "D1":9,
                "D2":10
            }
        ]
    },      
]
}   

And my expected CSV output is

F1,F2,E1,E2,D1,D2
1,2
1,2,3,4
1,2,5,6
1,2,7,8,9,10
ydoow
  • 2,969
  • 4
  • 24
  • 40
  • can you not convert that data into a datatable.. then from there convert to CSV – MethodMan Oct 06 '15 at 03:08
  • @MethodMan that's exactly what I did to handle the first few levels. After all I found it's too time consuming to put them into structure so I am checking if there's any automation. – ydoow Oct 06 '15 at 03:20
  • where is the source of data coming from.. is it stored in a db ..? – MethodMan Oct 06 '15 at 03:25
  • @MethodMan It's from HttpContext, I've further made it into string. – ydoow Oct 06 '15 at 03:50
  • You dont want `F3` in output ? – Gaurav Gandhi Oct 06 '15 at 04:16
  • @ErrHunter no, the child element would be enough to represent F3 – ydoow Oct 06 '15 at 04:21
  • @ErrHunter i think he dont need ..he included F3 children – Rakin Oct 06 '15 at 04:22
  • 1
    I have somewhat tried, Check here : https://dotnetfiddle.net/7MBHUc Its not working, but i am able to traverse. Will update as next when possible. – Gaurav Gandhi Oct 06 '15 at 05:34
  • @ErrHunter Looks like a closer solution. 1) Need to get rid of the node with child elements; and 2) more data type checks – ydoow Oct 06 '15 at 05:58
  • @ydoow, Yes i know. But i am bit out, so can update after 8-10 hrs. – Gaurav Gandhi Oct 06 '15 at 06:38
  • There is some inconsistency in your requirement. You write a row for the root object, which has children; you do not write a row for the `"F3[2]"` object, which has children. Is there a simple rule to tell when to write a row for an object that has children? – dbc Oct 06 '15 at 10:05

3 Answers3

13

There's an inconsistency in your request: you want a row to be generated for the root object, which has children, but you don't want a row to be generated for the "F3[2]" object, which also has children. So it sounds like your rule is, "print a row for an object with at least one primitive-value property, as long as that object is either the root object or does not have descendant objects with at a least one primitive-value property". That's a little tricky, but can be done with LINQ to JSON

        var obj = JObject.Parse(json);

        // Collect column titles: all property names whose values are of type JValue, distinct, in order of encountering them.
        var values = obj.DescendantsAndSelf()
            .OfType<JProperty>()
            .Where(p => p.Value is JValue)
            .GroupBy(p => p.Name)
            .ToList();

        var columns = values.Select(g => g.Key).ToArray();

        // Filter JObjects that have child objects that have values.
        var parentsWithChildren = values.SelectMany(g => g).SelectMany(v => v.AncestorsAndSelf().OfType<JObject>().Skip(1)).ToHashSet();

        // Collect all data rows: for every object, go through the column titles and get the value of that property in the closest ancestor or self that has a value of that name.
        var rows = obj
            .DescendantsAndSelf()
            .OfType<JObject>()
            .Where(o => o.PropertyValues().OfType<JValue>().Any())
            .Where(o => o == obj || !parentsWithChildren.Contains(o)) // Show a row for the root object + objects that have no children.
            .Select(o => columns.Select(c => o.AncestorsAndSelf()
                .OfType<JObject>()
                .Select(parent => parent[c])
                .Where(v => v is JValue)
                .Select(v => (string)v)
                .FirstOrDefault())
                .Reverse() // Trim trailing nulls
                .SkipWhile(s => s == null)
                .Reverse());

        // Convert to CSV
        var csvRows = new[] { columns }.Concat(rows).Select(r => string.Join(",", r));
        var csv = string.Join("\n", csvRows);

        Console.WriteLine(csv);

Using

public static class EnumerableExtensions
{
    // http://stackoverflow.com/questions/3471899/how-to-convert-linq-results-to-hashset-or-hashedset
    public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source)
    {
        return new HashSet<T>(source);
    }
}

Which outputs:

F1,F2,E1,E2,D1,D2
1,2
1,2,3,4
1,2,5,6
1,2,7,8,9,10
dbc
  • 104,963
  • 20
  • 228
  • 340
1

I wrote this and it is working for me Here we save all breadcrumps of object tree in headers with format prop_prop And save jarray property objects in headers in format prop1

    public Dictionary<string, string> ComplexJsonToDictionary(JObject jObject, Dictionary<string, string> result, string field)
    {
        foreach (var property in jObject.Properties())
        {
            var endField = field + (string.IsNullOrEmpty(field) ? "" : "_") + property.Name;

            var innerDictionary = new Dictionary<string, string>();
            try
            {
                var innerValue = JObject.Parse(Convert.ToString(property.Value));


                result.AddOrOverride(ComplexJsonToDictionary(innerValue, innerDictionary, endField));
            }
            catch (Exception)
            {
                try
                {
                    var innerValues = JArray.Parse(Convert.ToString(property.Value));
                    try
                    {
                        var i = 0;
                        foreach (var token in innerValues)
                        {
                            var innerValue = JObject.Parse(Convert.ToString(token));

                            result.AddOrOverride(ComplexJsonToDictionary(innerValue, innerDictionary, endField+i++));
                        }
                    }
                    catch (Exception)
                    {
                        result.Add(endField, string.Join(",", innerValues.Values<string>()));
                    }
                }
                catch (Exception)
                {
                    result.Add(endField, property.Value.ToString());
                }
            }
        }
        return result;
    }

Thanks for atantion and please write review if appropriate.

-2

pass the json parsed jobject to static extension, then it will return the array of jobjects with flatten, then convert to csv. some of the source code is taken from the other stack overflow and other resources, I don't have reference.

 public static IEnumerable<string> JsonToCsvRowsWithHierarchyHeaders(string jsonData)
    {
        if (jsonData.Trim().StartsWith("[") && jsonData.Trim().EndsWith("]"))
        {
            var startString = "{\"appendRoot\":";
            var endString = "}";
            jsonData = $"{startString}{jsonData}{endString}";
        }

        var jObject = JsonConvert.DeserializeObject<JObject>(jsonData);
        var flattenJObjects = JsonParserExtensions.FlattenJsonGetJObjects(jObject).ToList();
  
        var csvRows = new List<string>();
        if (flattenJObjects.Any())
        {
            var firstRow = flattenJObjects.First().Children<JProperty>().Select(x => x.Name).ToList();
            var header = string.Join(delimeter, firstRow).Replace("appendRoot_", "").ToLower();
            csvRows.Add(header);
            foreach (var flattenJObject in flattenJObjects)
            {
                var tokens = flattenJObject.Children<JProperty>();

                if (firstRow.Count() != tokens.Count())
                {
                    var missMatchPropertyValues = firstRow.Select(cell => tokens.FirstOrDefault(x => x.Name == cell))
                        .Select(value => value == null
                            ? string.Empty
                            : CheckAndUpdateRowCellValueTextQualifier(JsonConvert.DeserializeObject<string>(value.Value.ToString(Newtonsoft.Json.Formatting.None))))
                        .ToList();
                    var rowString = string.Join(delimeter, missMatchPropertyValues);
                    csvRows.Add(rowString);
                }
                else
                {
                    var rowValue = tokens.Select(token =>
                        CheckAndUpdateRowCellValueTextQualifier(
                            JsonConvert.DeserializeObject<string>(
                                token.Value.ToString(Newtonsoft.Json.Formatting.None))));
                    var rowString = string.Join(delimeter, rowValue);
                    csvRows.Add(rowString);
                }
            }
        }
        return csvRows;
    }

 
    private static string CheckAndUpdateRowCellValueTextQualifier(string value)
    {
        const string q = @"""";
        if (!string.IsNullOrEmpty(value) && value.Contains('\"'))
        {
            return value;
        }

        if (!string.IsNullOrEmpty(value) && (value.Contains(',') ||
                                         value.Contains('"') ||
                                         value.Contains('\n') || value.Contains('\r')))
        {
            return $"{q}{value}{q}";
        }
        
        return value;
    }
}

public static class JsonParserExtensions
{
    public static IEnumerable<JObject> FlattenJsonGetJObjects(JObject jObject, string parentName = null)
    {
        if (!(parentName is null))
            jObject = RenamePropertiesByHierarchyName(jObject, parentName);

        var fields = jObject.Properties().Where(p => p.Value.GetType().Name == "JValue").ToList();
        var objects = jObject.Properties().Where(p => p.Value.GetType().Name == "JObject").ToList();
        var arrays = jObject.Properties().Where(p => p.Value.GetType().Name == "JArray").ToList();
        var objectsArray = arrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JObject")).ToList();
        var valuesArray = arrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JValue")).ToList();

        var nestedObjects = ProcessNestedObjects(objects);

        var joinedInnerObjects = nestedObjects.Any()
            ? nestedObjects.Select(innerObject => JoinJObject(new JObject(fields), innerObject))
            : new List<JObject> { new JObject(fields) };

        var arraysObjectList = GetJObjectsFromArrayOfJProperties(objectsArray);
        var arraysValueList = GetJObjectsFromArrayOfValues(valuesArray);
        var joinedAll = joinedInnerObjects.SelectMany(inner => JoinMultipleJObjects(arraysObjectList, arraysValueList, inner));
        return joinedAll;
    }

    public static List<JObject> ProcessNestedObjects(List<JProperty> jObjects)
    {
        var processNestedObjects = new List<JObject>();
        var renamedJObjects = jObjects?.Select(obj => RenamePropertiesByHierarchyName(obj.Value.ToObject<JObject>(), obj.Name)).ToList();

        if (!(renamedJObjects?.Count > 0)) return processNestedObjects;
        var renamed = renamedJObjects.Aggregate((acc, next) => JoinJObject(acc, next));

        var nestedObjects = renamed.Properties().Where(p => p.Value.GetType().Name == "JObject").ToList();
        var nestedArrays = renamed.Properties().Where(p => p.Value.GetType().Name == "JArray").ToList();
        var nestedObjectsArray = nestedArrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JObject")).ToList();
        var nestedValuesArray = nestedArrays.Where(array => array.Value.All(elements => elements.GetType().Name == "JValue")).ToList();

        nestedArrays.ForEach(p => renamed.Remove(p.Name));
        nestedObjects.ForEach(p => renamed.Remove(p.Name));

        var nestedObjectList = new List<JObject>();
        var nestedMultipleObjectList = new List<JObject>();
        foreach (var listJObjects in nestedObjects.Select(innerObject => FlattenJsonGetJObjects(innerObject.Value.ToObject<JObject>(), innerObject.Name)).ToList())
        {
            if (listJObjects.Count() > 1)
                nestedMultipleObjectList.AddRange(listJObjects);
            else
                nestedObjectList.Add(listJObjects.First());
        }

        var jObjectsFromArrayOfJProperties = GetJObjectsFromArrayOfJProperties(nestedObjectsArray);
        var jObjectsFromArrayOfValues = GetJObjectsFromArrayOfValues(nestedValuesArray);

        var aggregate = nestedObjectList.Aggregate(renamed, (acc, next) => JoinJObject(acc, next));
        var groupedNestedObjects = (nestedMultipleObjectList.Any()) ? nestedMultipleObjectList.Select(nested => JoinJObject(aggregate, nested))
            : new List<JObject> { aggregate };
        var groupedNestedObjectsList = groupedNestedObjects.Select(groupedNested => JoinMultipleJObjects(jObjectsFromArrayOfJProperties, jObjectsFromArrayOfValues, groupedNested));
        processNestedObjects.AddRange(groupedNestedObjectsList.SelectMany(e => e));
        return processNestedObjects;
    }

    public static List<JObject> JoinMultipleJObjects(List<JObject> nestedArraysObjectList, List<JObject> nestedArraysValueList, JObject groupedNestedObjects)
    {
        var result = new List<JObject>();
        var joined = new List<JObject>();
        if (!nestedArraysObjectList.Any())
            joined.Add(groupedNestedObjects);
        else
            nestedArraysObjectList.ForEach(e => joined.Add(JoinJObject(groupedNestedObjects, e)));

        result.AddRange(nestedArraysValueList.Any()
            ? nestedArraysValueList
                .SelectMany(value => joined, (value, joinedItem) => JoinJObject(joinedItem, value)).ToList()
            : joined);
        return result;
    }

    public static List<JObject> GetJObjectsFromArrayOfJProperties(List<JProperty> nestedJProperties)
    {
        var fromArrayOfJProperties = new List<JObject>();
        foreach (var jProperty in nestedJProperties)
        {
            var nestedArraysObjectList = new List<JObject>();
            var name = jProperty.Name;
            var jPropertyValue = jProperty.Value;
            var renamedObjects = jPropertyValue?.Select(obj => RenamePropertiesByHierarchyName(obj.ToObject<JObject>(), name)).ToList();
            foreach (var jObjects in renamedObjects.Select(innerObject => FlattenJsonGetJObjects(innerObject.ToObject<JObject>())))
            {
                nestedArraysObjectList.AddRange(jObjects);
            }

            if (fromArrayOfJProperties.Any() && nestedArraysObjectList.Any())
                fromArrayOfJProperties = nestedArraysObjectList
                    .SelectMany(nested => fromArrayOfJProperties, (current, joined) => JoinJObject(joined, current)).ToList();

            if (!fromArrayOfJProperties.Any())
                fromArrayOfJProperties.AddRange(nestedArraysObjectList);
        }
        return fromArrayOfJProperties;
    }

    public static List<JObject> GetJObjectsFromArrayOfValues(List<JProperty> nestedValuesArray)
    {
        return (from innerArray in nestedValuesArray let name = innerArray.Name let values = innerArray.Value from innerValue in values select new JObject(new JProperty(name, innerValue.ToObject<JValue>()))).ToList();
    }

    public static JObject RenamePropertiesByHierarchyName(JObject jObject, string hierarchyName)
    {
        var properties = jObject.Properties().ToList().Select(p => new JProperty($"{hierarchyName}_{p.Name}", p.Value));
        return new JObject(properties);
    }

    public static JObject JoinJObject(JObject parentJObject, JObject innerObject)
    {
        var joinJObject = new JObject
            {
                parentJObject.Properties(),
                innerObject.Properties()
            };
        return joinJObject;
    }
}
Bharath
  • 1
  • 1
  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/31027121) – cboden Feb 13 '22 at 12:34