9

I have a list of objects, which in turn contain nested lists of further objects. I would like to flatten the object graph into a DataTable.

I found code which takes a collection of objects and maps them to a DataTable (referenced below), but it assumes that properties are simple types that can reliably be converted to a string value.

I am thinking this is only possible through recursion, but perhaps there are better ways to do this.

Data Model

Imagine that we have a List of Customer objects:

public class Item
{
    public string SKU { get; set; }
    public string Description { get; set; }
    public double Price { get; set; }
}

public class Order
{
    public string ID { get; set; }
    public List<Item> Items { get; set; }
}

public class Customer
{
    public string Name { get; set; }
    public string Email { get; set; }
    public List<Order> Orders { get; set; }
}

I would like to fully flatten the collection into a single DataTable with the following DataColumns:

  • Customer.Name
  • Customer.Email
  • Customer.Order.ID
  • Customer.Order.Item.SKU
  • Customer.Order.Item.Description
  • Customer.Order.Item.Price

Sample Implementation

The following is a sample implementation found elsewhere on Stack Overflow, but this will only work if the object exclusively contains simple properties, such as primitives or strings, and not other nested objects. I added a comment in the function where I think we can apply recursion, but I'm not entirely sure it will work.

public static DataTable CreateDataTableFromAnyCollection<T>(IEnumerable<T> list)
{
    Type type = typeof(T);
    var properties = type.GetProperties();

    DataTable dataTable = new DataTable();
    foreach (PropertyInfo info in properties)
    {
        dataTable.Columns.Add(new DataColumn(info.Name, Nullable.GetUnderlyingType(info.PropertyType) ?? info.PropertyType));
    }

    foreach (T entity in list)
    {
        object[] values = new object[properties.Length];
        for (int i = 0; i < properties.Length; i++)
        {
            values[i] = properties[i].GetValue(entity,null); // if not primitive pass a recursive call
        }

        dataTable.Rows.Add(values);
    }

    return dataTable;
}
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
Joey
  • 261
  • 4
  • 14
  • 1
    Youre going to need a bit more detail to explain what you mean by "linearize into a DataTable". Also I don't know if this is helpful to you, but .Net's XML serialization handles this kind of stuff already – Nigel Apr 28 '22 at 19:19
  • What's the motivation for using a DataTable? – Caius Jard Apr 28 '22 at 19:23
  • Please see check edited answer. – Joey Apr 28 '22 at 20:23
  • 1
    *but not entirely sure it will work* - have a look at how Newtonsoft does it? Serializing an object graph is o a single string is notionally equivalent to your task.. – Caius Jard Apr 29 '22 at 05:11
  • 1
    do you have full control of the classes? Can you make them implement an interface, add attributes, etc.? – david-ao May 01 '22 at 08:19
  • 1
    By "linearize" do you mean "flatten"? What data table rows and columns do you expect to generate for the data model above? And what data table would you expect to get if `Customer` had two collection properties of different types, e.g. if `List Items` belonged directly to `Customer`? – dbc May 01 '22 at 13:18
  • @Joey: I've edited your answer to clarify the term "primitive" to avoid any ambiguity. C# uses a very narrow definition of what types are considered primitive ([source](https://learn.microsoft.com/en-us/dotnet/api/system.type.isprimitive?view=net-6.0)), covering `Boolean`, `S`/`Byte`, `U`/`Int16`, `U`/`Int32`, `U`/`Int64`, `U`/`IntPtr`, `Char`, `Double`, and `Single`. This notably excludes `String`, `decimal`, `DateTime`, `Uri`, `Guid`, and a number of others, which this will still write. (Technically, it'll write _any_ object, but the value won't be _meaningful_ for complex types.) – Jeremy Caney May 07 '22 at 22:59
  • 1
    Thanks so much for the updates, i was away on vacation and didn't get a chance to see your comments. Funny thing I was working along the same lines but you beat me to it lol. I think this warrants creating a GitHub, it's pretty neat and valuable and i can see it have a lot of applications. I am a little swamped at the moment but please feel free to create one! Thanks again – Joey May 26 '22 at 15:53

2 Answers2

8

If you're only working with one type of model object (in this case, Customer), then I recommend @Huntbook's answer, as that dramatically simplifies this problem.

That said, if you truly need for this to be a generic method because e.g., you'll be handling a variety of different model objects (i.e., not exclusively Customer), then you can certainly expand your proposed CreateDataTableFromAnyCollection<T>() method to support recursion, though it isn't a trivial task.

Approach

The recursion process isn't quite as straight forward as you might expect since you're looping through a collection of objects, yet only need to determine the definition of the DataTable once.

As a result, it makes more sense to separate out your recursive functionality into two separate methods: One for establishing the schema, the other for populating the DataTable. I propose:

  1. EstablishDataTableFromType(), which dynamically establishes the schema of the DataTable based on a given Type (along with any nested types), and
  2. GetValuesFromObject(), which, for each (nested) object in your source list, adds the values from each property to a list of values, which can subsequently be added to a DataTable.

Challenges

The above approach glosses over a number of challenges introduced when working with complex objects and collections. These include:

  1. How do we determine if a property is a collection—and, thus, subject to recursion? We will be able to use the Type.GetInterfaces() and Type.GetGenericTypeDefinition() methods to identify if a type implements ICollection<>. I implement this in the private IsList() method below.

  2. If it is a collection, how do we determine what type the collection contains (e.g., Order, Item)? We will be able to use the Type.GetGenericArguments() to determine what the generic type argument for the ICollection<> is. I implement this in the private GetListType() method below.

  3. How do we ensure all data is represented, given that each nested item necessitates an additional row? We will need to establish a new record for every permutation in the object graph.

  4. What happens if you have two collections on an object, as per @DBC's question in the comments? My code assumes you'll want a permutation for each. So if you added Addresses to Customer, this might yield something like:

    Customer.Name Customer.Orders.ID Customer.Orders.Items.SKU Customer.Addresses.PostalCode
    Bill Gates 0 001 98052
    Bill Gates 0 002 98052
    Bill Gates 0 001 98039
    Bill Gates 0 002 98039
  5. What happens if an object has two collections of the same Type? Your proposal infers that the DataColumn names should be delineated by the Type, but that would introduce a naming conflict. To address that, I assume the property Name should be used as the delineator, not the property Type. E.g., in your sample model, the DataColumn will be Customer.Orders.Items.SKU, not Customer.Order.Item.SKU.

  6. How do you differentiate between complex objects and "primitive" objects? Or, more accurately, objects that can be reliably serialized to a meaningful value? Your question assumes that collection properties will contain complex objects and other properties won't, but that's not necessarily true. E.g., a property that points to a complex object, or, conversely, a collection that contains simple objects:

    public class Order
    {
        public List<string> CouponCodes { get; set; } = new();
        public Address ShipTo { get; set; }
    }
    

    To address this, I rely on @julealgon's answer to How do I tell if a type is a "simple" type? i.e. holds a single value. I implement this in the private IsSimple() method below.

Solution

The solution to this is considerably more complicated than the sample code you referenced. I'll provide a brief summary of each method below. In addition, I've included XML Docs and some comments within the code. If you have questions about any specific functionality, however, please ask and I'll provide further clarification.

EstablishDataTableFromType(): This method will establish a DataTable definition based on a given Type. Instead of simply looping through values, however, this method will recurse over any complex types discovered—including those contained within collections.

/// <summary>
///   Populates a <paramref name="dataTable"/> with <see cref="DataColumn"/>
///   definitions based on a given <paramref name="type"/>. Optionally prefixes
///   the <see cref="DataColumn"/> name with a <paramref name="prefix"/> to
///   handle nested types.
/// </summary>
/// <param name="type">
///   The <see cref="Type"/> to derive the <see cref="DataColumn"/> definitions
///   from, based on properties.
/// </param>
/// <param name="dataTable">
///   The <see cref="DataTable"/> to add the <see cref="DataColumn"/>s to.
/// </param>
/// <param name="prefix">
///   The prefix to prepend to the <see cref="DataColumn"/> name.
/// </param>

private static void EstablishDataTableFromType(Type type, DataTable dataTable, string prefix = "") {
    var properties = type.GetProperties();
    foreach (System.Reflection.PropertyInfo property in properties)
    {

        // Handle properties that can be meaningfully converted to a string
        if (IsSimple(property.PropertyType))
        {
            dataTable.Columns.Add(
                new DataColumn(
                    prefix + property.Name,
                    Nullable.GetUnderlyingType(property.PropertyType)?? property.PropertyType
                )
            );
        }

        // Handle collections
        else if (IsList(property.PropertyType))
        {
            // If the property is a generic list, detect the generic type used
            // for that list
            var listType = GetListType(property.PropertyType);
            // Recursively call this method in order to define columns for
            // nested types
            EstablishDataTableFromType(listType, dataTable, prefix + property.Name + ".");
        }

        // Handle complex properties
        else {
            EstablishDataTableFromType(property.PropertyType, dataTable, prefix + property.Name + ".");
        }
    }
}

GetValuesFromObject(): This method will take a source Object and, for every property, add the value of the property to an object[]. If an Object contains an ICollection<> property, it will recurse over that property, establishing an object[] for every permutation.

/// <summary>
///   Populates a <paramref name="target"/> list with an array of <see cref="
///   object"/> instances representing the values of each property on a <paramref
///   name="source"/>.
/// </summary>
/// <remarks>
///   If the <paramref name="source"/> contains a nested <see cref="ICollection{T}"/>,
///   then this method will be called recursively, resulting in a new record for
///   every nested <paramref name="source"/> in that <see cref="ICollection{T}"/>.
/// </remarks>
/// <param name="type">
///   The expected <see cref="Type"/> of the <paramref name="source"/> object.
/// </param>
/// <param name="source">
///   The source <see cref="Object"/> from which to pull the property values.
/// </param>
/// <param name="target">
///   A <see cref="List{T}"/> to store the <paramref name="source"/> values in.
/// </param>
/// <param name="columnIndex">
///   The index associated with the property of the <paramref name="source"/>
///   object.
/// </param>

private static void GetValuesFromObject(Type type, Object? source, List<object?[]> target, ref int columnIndex)
{

    var properties          = type.GetProperties();

    // For each property, either write the value or recurse over the object values
    for (int i = 0; i < properties.Length; i++)
    {

        var property        = properties[i];
        var value           = source is null? null : property.GetValue(source, null);
        var baseIndex       = columnIndex;

        // If the property is a simple type, write its value to every instance of
        // the target object. If there are multiple objects, the value should be
        // written to every permutation
        if (IsSimple(property.PropertyType))
        {
            foreach (var row in target)
            {
                row[columnIndex] = value;
            }
            columnIndex++;
        }

        // If the property is a generic list, recurse over each instance of that
        // object. As part of this, establish copies of the objects in the target
        // storage to ensure that every a new permutation is created for every
        // nested object.
        else if (IsList(property.PropertyType))
        {
            var list        = value as ICollection;
            var collated    = new List<Object?[]>();

            // If the list is null or empty, rely on the type definition to insert 
            // null values into each DataColumn.
            if (list is null || list.Count == 0) {
                GetValuesFromObject(GetListType(property.PropertyType), null, collated, ref columnIndex);
                continue;
            }

            // Otherwise, for each item in the list, create a new row in the target 
            // list for its values.
            foreach (var item in list)
            {
                columnIndex = baseIndex;
                var values  = new List<Object?[]>();
                foreach (var baseItem in target)
                {
                    values.Add((object?[])baseItem.Clone());
                }
                GetValuesFromObject(item.GetType(), item, values, ref columnIndex);
                collated.AddRange(values);
            }

            // Finally, write each permutation of values to the target collection
            target.Clear();
            target.AddRange(collated);

        }

        // If the property is a complex type, recurse over it so that each of its
        // properties are written to the datatable.
        else
        {
            GetValuesFromObject(property.PropertyType, value, target, ref columnIndex);
        }

    }
}

CreateDataTableFromAnyCollection: The original method you provided obviously needs to be updated to call the EstablishDataTableFromType() and GetValuesFromObject() methods, thus supporting recursion, instead of simply looping over a flat list of properties. This is easy to do, though it does require a bit of scaffolding given how I've written the GetValuesFromObject() signature.

/// <summary>
///   Given a <paramref name="list"/> of <typeparamref name="T"/> objects, will
///   return a <see cref="DataTable"/> with a <see cref="DataRow"/> representing
///   each instance of <typeparamref name="T"/>.
/// </summary>
/// <remarks>
///   If <typeparamref name="T"/> contains any nested <see cref="ICollection{T}"/>, the
///   schema will be flattened. As such, each instances of <typeparamref name=
///   "T"/> will have one record for every nested item in each <see cref=
///   "ICollection{T}"/>.
/// </remarks>
/// <typeparam name="T">
///   The <see cref="Type"/> that the source <paramref name="list"/> contains a
///   list of.
/// </typeparam>
/// <param name="list">
///   A list of <typeparamref name="T"/> instances to be added to the <see cref=
///   "DataTable"/>.
/// </param>
/// <returns>
///   A <see cref="DataTable"/> containing (at least) one <see cref="DataRow"/>
///   for each item in <paramref name="list"/>.
/// </returns>

public static DataTable CreateDataTableFromAnyCollection<T>(IEnumerable<T> list)
{

    var dataTable           = new DataTable();

    EstablishDataTableFromType(typeof(T), dataTable, typeof(T).Name + ".");

    foreach (T source in list)
    {
        var values          = new List<Object?[]>();
        var currentIndex    = 0;

        // Establish an initial array to store the values of the source object
        values.Add(new object[dataTable.Columns.Count]);

        // Assuming the source isn't null, retrieve its values and add them to the 
        // DataTable.
        if (source is not null)
        {
            GetValuesFromObject(source.GetType(), source, values, ref currentIndex);
        }

        // If the source object contains nested lists, then multiple permutations
        // of the source object will be returned.
        foreach (var value in values)
        {
            dataTable.Rows.Add(value);
        }

    }

    return dataTable;

}

IsSimple(): A helper method to determine if a property type can be reliably serialized to a meaningful string value. If it can't, then the above functions will recurse over it, setting each of its property values to a DataColumn. This is based on @julealgon 's answer to How do I tell if a type is a "simple" type? i.e. holds a single value.

/// <summary>
///   Determine if a given <see cref="Type"/> can be reliably converted to a single
///   <see cref="String"/> value in the <see cref="DataTable"/>.
/// </summary>
/// <param name="type">
///   The <see cref="Type"/> to determine if it is a simple type.
/// </param>
/// <returns>
///   Returns <c>true</c> if the <paramref name="type"/> can be reliably converted
///   to a meaningful <see cref="String"/> value.
/// </returns>

private static bool IsSimple(Type type) =>
  TypeDescriptor.GetConverter(type).CanConvertFrom(typeof(string));

IsList(): Here, I've added a simple helper method for determining if the Type of a given property is a generic ICollection<> or not. It is used by both EstablishDataTableFromType() as well as GetValuesFromObject(). This rely's on the Type type's IsGenericType and GetGenericTypeDefinition(). I used ICollection<> not IEnumerable<> since e.g. String implements IEnumerable<> (you don't want a new column for every character in a string!)

/// <summary>
///   Simple helper function to determine if a given <paramref name="type"/> is a
///   generic <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="type">
///   The <see cref="Type"/> to determine if it is an <see cref="ICollection{T}"/>.
/// </param>
/// <returns>
///   Returns <c>true</c> if the <paramref name="type"/> is a generic <see cref=
///   "ICollection{T}"/>.
/// </returns>

private static bool IsList(Type type) => type
    .GetInterfaces()
    .Any(i => i.IsGenericType && i.GetGenericTypeDefinition() == typeof(ICollection<>));

GetListType(): Finally, I've added another simple helper method for determining the generic Type of a given generic ICollection<>. It is used by both EstablishDataTableFromType() as well as GetValuesFromObject(). This is very similar to the IsList() method above, except that it returns the specific Type, instead of just confirming that the property type implements the ICollection<> interface.

/// <summary>
///   Simple helper function to determine the generic <paramref name="type"/> of
///   an <see cref="ICollection{T}"/>.
/// </summary>
/// <param name="type">
///   The <see cref="Type"/> implementing <see cref="ICollection{T}"/>.
/// </param>
/// <returns>
///   Returns the generic <see cref="Type"/> associated with the <see cref=
///   "ICollection{T}"/> implemented for the <paramref name="type"/>.
/// </returns>

private static Type GetListType(Type type) => type
    .GetInterfaces()
    .Where(i => i.IsGenericType && typeof(ICollection<>) == i.GetGenericTypeDefinition())
    .FirstOrDefault()
    .GetGenericArguments()
    .Last();

Validation

Here's a very simple test (written for XUnit) to validate the basic functionality. This only confirms that the number of DataRow instances in the DataTable match the anticipated number of permutations; it doesn't validate the actual data in each record—though I've separately validated that the data is correct:

[Fact]
public void CreateDataTableFromAnyCollection() 
{
    
    // ARRANGE

    var customers           = new List<Customer>();

    // Create an object graph of Customer, Order, and Item instances, three per
    // collection 
    for (var i = 0; i < 3; i++) 
    {
        var customer        = new Customer() {
            Email           = "Customer" + i + "@domain.tld",
            Name            = "Customer " + i
        };
        for (var j = 0; j < 3; j++) 
        {
            var order = new Order() 
            {
                ID = i + "." + j
            };
            for (var k = 0; k < 3; k++) 
            {
                order.Items.Add(
                    new Item() 
                    {
                        Description = "Item " + k,
                        SKU = "0123-" + k,
                        Price = i + (k * .1)
                    }
                );
            }
            customer.Orders.Add(order);
        }
        customers.Add(customer);
    }

    // ACT
    var dataTable = ParentClass.CreateDataTableFromAnyCollection<Customer>(customers);

    // ASSERT
    Assert.Equal(27, dataTable.Rows.Count);

    // CLEANUP VALUES
    dataTable.Dispose();

}

Note: This assumes that your CreateDataTableFromAnyCollection() method is placed in a class called ParentClass; obviously, you'll need to adjust that based on your application's structure.

Conclusion

This should give you a good idea of how to dynamically map an object graph into a flattened DataTable, while also addressing common scenarios you're likely going to encounter, such as properties referencing complex objects (e.g., the ShipTo example above) and null or empty collections. Obviously, your specific data model may introduce additional challenges unforeseen in my implementation; in that case, this should provide a solid foundation for you to build off of.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
  • 1
    @Joey: FYI: I've updated this implementation to support `ICollection<>` instead of just `List<>`. This allows your models to include e.g., `Collection<>`, `KeyedCollection<>`, and `Dictionary<>`, among others. For `IDictionary<>` implementations, you'll end up with a separate column for the `Key` and the `Value` properties. I've also fixed a bug which failed to correctly detect types that were _derived_ from generic collections, but aren't themselves generic (such as implementations of `KeyedCollection<>`, as well as other specialized collections). – Jeremy Caney May 06 '22 at 22:28
  • 1
    @Joey: I've further updated this to support collections that are null or empty. That's not a likely issue in your sample model, but is a likely issue you would have run up against in other cases. To support this, I now pass the list `Type` to `GetValuesFromObject()`, so it can rely on the metadata to populate each of the missing columns with a null value. (Previously, `GetValuesFromObject` relied on the `source` object to determine the type, which isn't possible if the `source` object is `null`.) – Jeremy Caney May 07 '22 at 20:11
  • 1
    @Joey: I've made what I expect to be a final update to the code. It now detects complex types ([reference](https://stackoverflow.com/a/65079923/3025856)) in order to handle scenarios where properties reference complex types outside of a collection—or, conversely, where collections contain basic values (such as strings). While this goes well beyond the scope of your original question, it addresses the bulk of the limitations identified in my original answer, and should support a pretty broad range of data models. – Jeremy Caney May 07 '22 at 22:47
1

In that case, you will have a row for each Item in every Order in the Orders list of every Customer object.

If you already know what columns you want to have in the DataTable, you don't need to generalize that much. The CreateDataTableFromAnyCollection<T>() method you found, is very general (and generic); creating a DataTable from any one-dimensional structure generic type. In fact, it's so general it requires using Reflection!

A recursive call is meant to break down complex functions, this contradicts the point of a generic method. So, such a method would either have to consist of non-generic sub-functions, that would in turn be called recursively, or make use of System.Reflection.MethodInfo.Invoke to recurse. Read more on this here.

Since you already know what you need, just do:

    public static DataTable CreateDataTableFromCustomerList(List<Customer> list) {
        DataTable dataTable = new DataTable();
        dataTable.Columns.AddRange(new DataColumn[] {
            new DataColumn("CustomerName"),
            new DataColumn("CustomerEmail"),
            new DataColumn("OrderID"),
            new DataColumn("OrderItemSKU"),
            new DataColumn("OrderItemDescription"),
            new DataColumn("OrderItemPrice"),
        });

        foreach (Customer customer in list) {
            object[] values = new object[6];
            // { customer.Name, customer.Email };
            values[0] = customer.Name;
            values[1] = customer.Email;
            foreach (Order order in customer.Orders) {
                values[2] = order.ID;
                foreach (Item item in order.Items) {
                    values[3] = item.SKU;
                    values[4] = item.Desc;
                    values[5] = item.Price;
                    dataTable.Rows.Add(values);
                }
            }
        }
        return dataTable;
    }
Huntbook
  • 114
  • 9
  • Thanks for your answer, but i actually don't know before hand the structure of the classes. This was just an example, the term "generic" was incorrectly used, i should use "general". The function you gave is very specific to that example. I was looking for something more general that will work for similar examples so i don't have to rewrite a different function – Joey May 03 '22 at 06:18
  • 1
    Do you at least know that the type of collection is a `List`? Or can it be any `IEnumerable`? – Huntbook May 03 '22 at 06:50
  • @Huntbook: Presumably, it can’t be _any_ `IEnumerable` since they treat e.g., `Name` and `Email` each as a single column. (`String` is an `IEnumerable`). Given that, I opted to use an `ICollection<>`, as that covers e.g., `List<>`, `Collection<>`, `KeyedCollection<>`, `Dictionary<>`, &c., but doesn’t cover e.g., `String`. – Jeremy Caney May 08 '22 at 19:37