3

Using .NET 3.5, I have the need to group a DataTable by multiple columns, where the column names are contained in an IEnumerable.

// column source
IEnumerable<string> columns;
DataTable table;

IEnumerable<IGrouping<object, DataRow>> groupings = table
    .AsEnumerable()
    .GroupBy(row => ???);

Typically ??? would be an anonymous type as described here, but I need to use columns as the column source. Is this possible?

Community
  • 1
  • 1
Omer Bokhari
  • 57,458
  • 12
  • 44
  • 58

1 Answers1

2

The simplest way to do this is to create a function which selects the required columns and creates a hash for comparison. I'd do something like this:

Func<DataRow, IEnumerable<string>, string> f = (row, cols) => 
    String.Join("|", cols.Select(col => row[col]));

This is a function taking a DataRow and an IEnumerable<string>. It projects the IEnumerable<string> (the column names) into the corresponding column values (cols.Select(col => row[col]))), and then joins those values with a | character. I chose this character because it's an unlikely candidate to be included in your fields, but you might want to swap for another delimiter.

Then simply

IEnumerable<IGrouping<object, DataRow>> groupings = table
    .AsEnumerable()
    .GroupBy(row => f(row, columns));

Ideally we would select into a better type - not a string tied together with a arbitrary delimiter. But I expect that selecting into an object will cause problems due to the comparison of reference types, as two objects aren't equal even if they have identical properties.

Kirk Broadhurst
  • 27,836
  • 16
  • 104
  • 169
  • I had to add a cast in the projection to make it work: `Func, string> f = (row, cols) => String.Join("|", cols.Select(col => (string)row[col]).ToArray());` - but I wonder if you could explain how this line actually works ? `.GroupBy(row => f(row, columns))` - the function returns a string, but if you replaced it with a string it wouldn't work! How does the grouping know to look at the column values once it constructed the string in func? I'd be happy if you can point me to any resources explaining this. – Joanna Derks May 01 '12 at 20:00
  • 1
    @Joanna glad it helped. The reason it works is because the string that the function returns isn't just 'any string', it's a string containing the column values. If the column values match for any two rows, then the strings will match, and they'll be 'grouped'. – Kirk Broadhurst May 02 '12 at 00:07
  • so it's doing `(row => f(row, columns))`, ends up with values like `row => 1|10` or `row => 2|987` and these become Keys for the groupings, so that if any of them is repeated it will be grouped with the previous occurence. Ok, that makes perfect sense. Thanks! – Joanna Derks May 02 '12 at 06:54