1

I am writing a c# application that connects separate database systems. These systems could be flat-file db's, Oracle, Sql, Excel Files, ext. The job of the C# application is to provide an outlet for making all of these sources available in one spot. So basically, the application accepts a list of queries and connection settings for the respective database systems and collects a bunch of results.

The goal is to output a singe DataTable with the result of all these queries all joined/unioned together(depending on settings). Does C# provide an easy way to perform any join/union operations on a list of DataTables?

For example:

Table1:
__________________________________________________________
|tb1_pk_id|   tb1_name    |   tb1_data1   |   tb1_data2   |
|---------|---------------|---------------|---------------|
|    1    | tb1name_blah1 | tb1dat1_blah1 | tb1dat2blah1  |
|    2    | tb1name_blah2 | tb1dat1_blah2 | tb1dat2blah2  |
|    3    | tb1name_blah3 | tb1dat1_blah3 | tb1dat2blah3  |
----------------------------------------------------------- 

Table2:
__________________________________________________________
|tb2_pk_id|   tb2_name    |   tb2_data1   |   tb2_data2   |
|---------|---------------|---------------|---------------|
|    1    | tb2name_blah1 | tb2dat1_blah1 | tb2dat2blah1  |
|    2    | tb2name_blah2 | tb2dat1_blah2 | tb2dat2blah2  |
|    3    | tb2name_blah3 | tb2dat1_blah3 | tb2dat2blah3  |
----------------------------------------------------------- 

Join Results:
__________________________________________________________ _______________________________________________
|tb1_pk_id|   tb1_name    |   tb1_data1   |   tb1_data2   |   tb2_name    |   tb2_data1   |   tb2_data2   |
|---------|---------------|---------------|---------------|---------------|---------------|---------------|
|    1    | tb1name_blah1 | tb1dat1_blah1 | tb1dat2blah1  | tb2name_blah1 | tb2dat1_blah1 | tb2dat2blah1  |
|    2    | tb1name_blah2 | tb1dat1_blah2 | tb1dat2blah2  | tb2name_blah2 | tb2dat1_blah2 | tb2dat2blah2  |
|    3    | tb1name_blah3 | tb1dat1_blah3 | tb1dat2blah3  | tb2name_blah3 | tb2dat1_blah3 | tb2dat2blah3  |
-----------------------------------------------------------------------------------------------------------   

So far I have found the following code online (here) to do a merge on all the data:

private DataTable MergeAll(IList<DataTable> tables, String primaryKeyColumn)
        {
            if (!tables.Any())
                throw new ArgumentException("Tables must not be empty", "tables");
            if (primaryKeyColumn != null)
                foreach (DataTable t in tables)
                    if (!t.Columns.Contains(primaryKeyColumn))
                        throw new ArgumentException("All tables must have the specified primarykey column " + primaryKeyColumn, "primaryKeyColumn");

            if (tables.Count == 1)
                return tables[0];

            DataTable table = new DataTable("TblUnion");
            table.BeginLoadData(); // Turns off notifications, index maintenance, and constraints while loading data
            foreach (DataTable t in tables)
            {
                table.Merge(t); // same as table.Merge(t, false, MissingSchemaAction.Add);
            }
            table.EndLoadData();

            if (primaryKeyColumn != null)
            {
                // since we might have no real primary keys defined, the rows now might have repeating fields
                // so now we're going to "join" these rows ...
                var pkGroups = table.AsEnumerable()
                    .GroupBy(r => r[primaryKeyColumn]);
                var dupGroups = pkGroups.Where(g => g.Count() > 1);
                foreach (var grpDup in dupGroups)
                {
                    // use first row and modify it
                    DataRow firstRow = grpDup.First();
                    foreach (DataColumn c in table.Columns)
                    {
                        if (firstRow.IsNull(c))
                        {
                            DataRow firstNotNullRow = grpDup.Skip(1).FirstOrDefault(r => !r.IsNull(c));
                            if (firstNotNullRow != null)
                                firstRow[c] = firstNotNullRow[c];
                        }
                    }
                    // remove all but first row
                    var rowsToRemove = grpDup.Skip(1);
                    foreach (DataRow rowToRemove in rowsToRemove)
                        table.Rows.Remove(rowToRemove);
                }
            }

            return table;
        }

This works fine for doing a union, but I don't know if an easier way to do that already exists in .NET that will let me do ANY kind of join or union on a group of seprate DataTables (not just the union as in the code above) or do I have to custom code each type of join/union?

Community
  • 1
  • 1
Hooplator15
  • 1,540
  • 7
  • 31
  • 58

1 Answers1

2

No, there is not a simple .Net way of doing this....

LINQ can come close... you can create table joins in LINQ, but they are typically "inner joins". Doing a "left join" is a bit more complicated and requires the GroupJoin keyword. https://msdn.microsoft.com/en-us/library/bb386969(v=vs.110).aspx

If you'd like "do it yourself" with ADO.Net DataRelations, you might take a look at this old VB.Net article:

http://www.emmet-gray.com/Articles/DataRelations.html

egray
  • 390
  • 1
  • 4
  • I had a feeling this was the case, I just wanted to make sure I wasn't missing something. I'll look into using the LINQ methodology. Thank you! – Hooplator15 Jul 27 '16 at 20:39