22

I have the following code that fills dataTable1 and dataTable2 with two simple SQL queries, dataTableSqlJoined is filled from the same tables but joined together.

I'm trying to write a LINQ query that can create the dataTableLinqJoined as if it had been created using SQL. In my example below, it only returns the values from dataTable1.

The problem I have is what to put in the SELECT of the linq query. How can I create a new DataRow containing all the Columns from both DataRows. I will not know the exact column names / schema of the queries until runtime.

sqlCommand = new SqlCommand("SELECT ID, A, B FROM Table1", sqlConnection, sqlTransaction);
sqlAdapter = new SqlDataAdapter(sqlCommand);
DataTable dataTable1 = new DataTable();
sqlAdapter.Fill(dataTable1);

sqlCommand = new SqlCommand("SELECT ID, C, D FROM Table2", sqlConnection, sqlTransaction);
sqlAdapter = new SqlDataAdapter(sqlCommand);
DataTable dataTable2 = new DataTable();
sqlAdapter.Fill(dataTable2);

sqlCommand = new SqlCommand("SELECT Table1.ID, A, B, Table2.ID, C, D FROM Table1 INNER JOIN Table2 ON Table1.ID = Table2.ID", sqlConnection, sqlTransaction);
sqlAdapter = new SqlDataAdapter(sqlCommand);
DataTable dataTableSqlJoined = new DataTable();
sqlAdapter.Fill(dataTableSqlJoined);

var dataRows =
    from
        dataRows1 in dataTable1.AsEnumerable()
    join
        dataRows2 in dataTable2.AsEnumerable()
    on
        dataRows1.Field<int>("ID") equals dataRows2.Field<int>("ID")
    select
        dataRows1; // + dataRows2;

DataTable dataTableLinqJoined = dataRows.CopyToDataTable();

For a bit more background, the combined query is very DB intensive and is causing performance issues. The data returned by the first query is fairly static and can be heavily cached. The data returned by the second query changes constantly but is fast to run and therefore doesn't need to be cached. There is also a lot of code reliant upon the passing of the combined DataTable and therefore there are not many feasible options available in passing the data in a different format.

Robin Day
  • 100,552
  • 23
  • 116
  • 167

4 Answers4

20

Have you looked at this page yet?

HOW TO: Implement a DataSet JOIN helper class in Visual C# .NET

If that approach isn't LINQy enough for you, you could break out the row data into object arrays:

DataTable targetTable = dataTable1.Clone();
var dt2Columns = dataTable2.Columns.OfType<DataColumn>().Select(dc => 
    new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));
targetTable.Columns.AddRange(dt2Columns.ToArray());
var rowData =
    from row1 in dataTable1.AsEnumerable()
    join row2 in dataTable2.AsEnumerable()
        on row1.Field<int>("ID") equals row2.Field<int>("ID")
    select row1.ItemArray.Concat(row2.ItemArray).ToArray();
foreach (object[] values in rowData)
    targetTable.Rows.Add(values);

I think that's about as terse as you're going to be able to make it and I'll explain why: it's the schema.

A DataRow is not an independent object; it depends on its owning DataTable and cannot live without it. There is no supported way to create a "disconnected" DataRow; the CopyToDataTable() extension method works on rows that already exist in one DataTable and simply copy the schema from the source (remember, every DataRow has a reference to its parent Table) before copying the rows themselves (most likely using ImportRow, though I haven't actually opened up Reflector to check).

In this case you have a new schema that you need to create. Before you can create any (new) rows, you need to create the table to hold them first, and that means writing at least the 3 lines of code at the top of the method above.

Then you can finally create the rows - but only one at a time, since the DataTable and its associated DataRowCollection don't expose any methods to add multiple rows at a time. You could, of course, add your own extension method for the DataRowCollection to make this "look" nicer:

public static void AddRange(this DataRowCollection rc,
    IEnumerable<object[]> tuples)
{
    foreach (object[] data in tuples)
        rc.Add(tuples);
}

Then you could get rid of the foreach in the first method and replace it with:

targetTable.Rows.AddRange(rowData);

Although that's really just moving the verbosity, not eliminating it.

Bottom line, as long as you're working with the legacy DataSet class hierarchy, there's always going to be a little cruft. The Linq to DataSet extensions are nice, but they are only extensions and can't alter the limitations above.

Robin Day
  • 100,552
  • 23
  • 116
  • 167
Aaronaught
  • 120,909
  • 25
  • 266
  • 342
  • I had looked at that page but didn't like the fact it wasn't LINQy. Your example on building the targetTable is great. Thanks. – Robin Day Mar 04 '10 at 15:21
  • Made a minor edit to your code, the itemarray needs to be a Concat rather than a Union. Otherwise it strips out any columns with identical values (nulls). Other than that though. It's working perfectly!! Thankyou! – Robin Day Mar 04 '10 at 16:17
  • targetTable.Columns.AddRange(dt2Columns.ToArray()); gives error as its adding column with same name again. – Cannon Mar 30 '12 at 14:25
6

Aaronaught that was great. But would like add a few enhancements to your LINQy code. While adding columns from dataTable2 to Target table, there would be chance that few column would be already existing in Target table(on which we are joining). So here we go.

DataTable targetTable = dataTable1.Clone();
var dt2Columns = dataTable2.Columns.OfType<DataColumn>().Select(dc => 
new DataColumn(dc.ColumnName, dc.DataType, dc.Expression, dc.ColumnMapping));
var dt2FinalColumns=from dc in dt2Columns.AsEnumerable()
                    where targetTable.Columns.Contains(dc.ColumnName) == false
                    select dc;
targetTable.Columns.AddRange(dt2FinalColumns.ToArray());
var rowData =from row1 in dataTable1.AsEnumerable()
             join row2 in dataTable2.AsEnumerable()
             on row1.Field<int>("ID") equals row2.Field<int>("ID")
             select row1.ItemArray.Concat(row2.ItemArray.Where(r2=> row1.ItemArray.Contains(r2)==false)).ToArray();
foreach (object[] values in rowData)
targetTable.Rows.Add(values);

Hope this would be helpful for the guys like me.

suryakiran
  • 1,976
  • 25
  • 41
  • Change dtcomments to dt2Columns. One question : How do I join tables if I have more than tow tables to join ? – Cannon Mar 30 '12 at 15:12
  • `var rowData =from row1 in dataTable1.AsEnumerable() join row2 in dataTable2.AsEnumerable() on row1.Field("ID") equals row2.Field("ID") join row3 in .AsEnumerable() on ` – suryakiran May 21 '12 at 12:56
  • **NOTE**: the column names are truncated to **64** chars. If your col names are identical until _after_ the **64th** char you won't have that column in your `targetTable`. – Scotty.NET Nov 27 '12 at 17:37
  • Problem with ItemArray.Contains is that it just removes actual items and doesn't deal with the duplicate column issue. In this example, it would remove ID, where the ID was the same, but it would also remove values of other columns that happen to match values in the other table. – Peter Henry Nov 24 '21 at 14:17
1

Pardon me if I sound like an idiot.

I think, you should have the final table ready (with all the fields of table A & table B).
And, instead of using LINQ, do a join & then do a ForEach on the result & insert the value into final datatable.

Pseudocode:

dt1.Join(dt2).Where(...).ForEach(row => code to read the content of anonymous object & add it to finalTable.Rows)

shahkalpesh
  • 33,172
  • 3
  • 63
  • 88
  • I did consider something like this. e.g. running the combined query but instead of performing the joins just putting nulls into the columns from the second query. Then just running the second query to populate them. +1 for the answer, but I'm still hoping for something more "elegant". – Robin Day Mar 04 '10 at 14:31
0
select new {
    ID = dataRows1.ID,  // no need to select dataRows2.ID, because of JOIN.
    A = dataRows1.A,
    B = dataRows1.B,
    C = dataRows2.C,
    D = dataRows2.D 
};
Roger Lipscombe
  • 89,048
  • 55
  • 235
  • 380
  • I do not know the column names / schema until runtime. There is also no "simple" way to convert this type into a DataTable. – Robin Day Mar 04 '10 at 14:13
  • Didn't spot that in the question. Sorry. For converting unknown types into a DataTable, you can use reflection (or do crazy-mad stuff with Expression: http://www.infoq.com/articles/expression-compiler) – Roger Lipscombe Mar 04 '10 at 14:43