1

I have a heavily nested XML document that I need to load into my db for additional processing. For various reasons beyond the scope of this discussion I need to 'flatten' that structure down, then load it into a DataTables and then I can SQLBulkCopy it into the db where it will get processed. So assume my original XML looks something like this (mine is even more heavily nested, but this is the basic idea):

<data>
    <report id="1234" name="XYZ">
        <department id="234" name="Accounting">
            <item id="ABCD" name="some item">
                <detail id="detail1" value="1"/>                    
                <detail id="detail2" value="2"/>                    
                <detail id="detail3" value="3"/>                    
            </item>
        </department>      
    </report>
 </data>   

and I want to flatten that down into a single (albeit redundant) table structure where each attribute becomes a column (i.e. ReportId, ReportName, DepartmentId, DepartmentName, ItemId, ItemName, Detail1, Detail2, Detail3).

So my question is simply 'is it possible to accomplish this with a simple Linq query'? In the past I would just write some XSLT and be done with it but I'm curious if the Linq library can accomplish the same thing?

thanks!

snappymcsnap
  • 2,050
  • 2
  • 29
  • 53
  • 1
    It would really help if you'd show what your sample input would look like after transformation. It's not clear just from your description. – Jon Skeet Jan 17 '12 at 18:59

1 Answers1

3

Is this what you're looking for?

var doc = XDocument.Load(fileName);
var details =
    from report in doc.Root.Elements("report")
    from department in report.Elements("department")
    from item in department.Elements("item")
    from detail in item.Elements("detail")
    select new
    {
        ReportId = (int)report.Attribute("id"),
        ReportName = (string)report.Attribute("name"),
        DepartmentId = (int)department.Attribute("id"),
        DepartmentName = (string)department.Attribute("name"),
        ItemId = (string)item.Attribute("id"),
        ItemName = (string)item.Attribute("name"),
        DetailId = (string)detail.Attribute("id"),
        DetailValue = (int)detail.Attribute("value"),
    };

If you want it as a DataTable, you can use the following extension method:

public static DataTable ToDataTable<T>(this IEnumerable<T> source)
{
    PropertyInfo[] properties = typeof(T).GetProperties()
                                         .Where(p => p.CanRead && !p.GetIndexParameters().Any())
                                         .ToArray();

    DataTable table = new DataTable();
    foreach (var p in properties)
    {
        Type type = p.PropertyType;
        bool allowNull = !type.IsValueType;
        if (type.IsGenericType && type.GetGenericTypeDefinition() == typeof(Nullable<>))
        {
            allowNull = true;
            type = Nullable.GetUnderlyingType(type);
        }
        DataColumn column = table.Columns.Add(p.Name, type);
        column.AllowDBNull = allowNull;
        column.ReadOnly = !p.CanWrite;
    }

    foreach (var item in source)
    {
        DataRow row = table.NewRow();
        foreach (var p in properties)
        {
            object value = p.GetValue(item, null) ?? DBNull.Value;
            row[p.Name] = value;
        }
        table.Rows.Add(row);
    }

    return table;
}

Use it like this:

var table = details.CopyToDataTable();
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
  • +1 Was just about to post something similar, although your answer uses fewer intermediate steps than my approach. – Yuck Jan 17 '12 at 19:12
  • That's so easy and elegant I'm slapping myself for not thinking of it, thank you for your help. One small change is required to the code you posted though...since there are 1-to-many nodes and I need to map each based on its attribute 'id' field (e.g. needs to map to a column called Detail1. I got it to work using another query like this: (from a in detail.Elements("detail") where (string)a.Attribute("id") == "detail1" select a.Attribute("value").Value).Single() but there's probably an easier way – snappymcsnap Jan 17 '12 at 21:06