20

All, there are some question on this, but I can't seem to extract enough information to solve the problem for my case. I extract an unknown number of tables into SQL Server 'Tab1', 'Tab2', 'Tab3', ... , 'TabN'. The columns in these tables are different, but the row definitions are the same. I need to pull all the data in from the Server into N DataTables and then combine these to form a single DataTable. What I do currently is

int nTmpVolTabIdx = 1;
strSqlTmp = String.Empty;
using (DataTable dataTableALL = new DataTable())
{
    while (true)
    {
        string strTmpVolName = String.Format("Tab{0}", nTmpVolTabIdx);
        strSqlTmp = String.Format("SELECT * FROM [{0}];", strTmpVolName);

        // Pull the data from 'VolX' into a local DataTable.
        using (DataTable dataTable = UtilsDB.DTFromDB(conn, strTmpVolName, strSqlTmp, false))
        {
            if (dataTable == null)
                break;
            else
                dataTableALL.Merge(dataTable);
        }
        nTmpVolTabIdx++;
    }
    ...
}

This merges the DataTables but they are miss-aligned (padding blank cells onto the appended data set). I could append the columns of the new DataTable via a loop; but is there an easier/nicer way to do this (perhaps using LINQ)?

Thanks for your time.

Edit. To provide the example data sets.

What I required is

Full DataTable

The individual tables are

Tabs

After the first Merge operation I have the following

After Merge

Thanks again.

MoonKnight
  • 23,214
  • 40
  • 145
  • 277
  • 2
    What do you mean by *columns in these tables are different, but the row definitions are the same* ? – V4Vendetta Sep 05 '12 at 10:10
  • "The columns in these tables are different, but the row definitions are the same". Can you provide an example? – Hamlet Hakobyan Sep 05 '12 at 10:11
  • Maybe you're looking for a [`DataTable.Union`](http://weblogs.sqlteam.com/davidm/archive/2004/01/15/724.aspx) – Tim Schmelter Sep 05 '12 at 10:14
  • @TimSchmelter the Dispose method is avalible on DataTable. After the using statement finishes, the Dispose method on the DataTable is (of course) called. This method is implemented on the base class for DataTable: MarshalValueByComponent. When Dispose is called, native resources from MarshalValueByComponent (a base class of DataTable) are released. Therefore, it is possible that the using statement could alleviate some resource usage problems when used with DataTables. In addition, the using gives scope to this section of code... – MoonKnight Sep 05 '12 at 10:51
  • @Killercam: The `Dispose` is available for inheritance reasons but DataTable(or DataSets) [suppress finalization](http://msdn.microsoft.com/en-us/library/system.gc.suppressfinalize.aspx) in their constructors. So `Dispose` does nothing. But anyway, it doesn't hurt much. – Tim Schmelter Sep 05 '12 at 11:25
  • @TimSchmelter No disrespect intended here Tim, I am learning. If the finalizer is not called how are the resources released by the GC? – MoonKnight Sep 05 '12 at 11:29
  • 1
    The resources are released as every other (managed) object: when the garbage collector decides to do so because he **needs to** reclaim RAM, marking all of the currently-reachable pointers and removing all the pointers that aren't reachable and therefore are out of scope. – Tim Schmelter Sep 05 '12 at 11:44

1 Answers1

42

The table has repeating primary keys after the Merge because no primary-key was defined. So either specify the PK or try this method here which i've written from scratch(so it's not really tested):

public static DataTable MergeAll(this IList<DataTable> tables, String primaryKeyColumn)
{
    if (!tables.Any())
        throw new ArgumentException("Tables must not be empty", "tables");
    if(primaryKeyColumn != null)
        foreach(DataTable t in tables)
            if(!t.Columns.Contains(primaryKeyColumn))
                throw new ArgumentException("All tables must have the specified primarykey column " + primaryKeyColumn, "primaryKeyColumn");

    if(tables.Count == 1)
        return tables[0];

    DataTable table = new DataTable("TblUnion");
    table.BeginLoadData(); // Turns off notifications, index maintenance, and constraints while loading data
    foreach (DataTable t in tables)
    {
        table.Merge(t); // same as table.Merge(t, false, MissingSchemaAction.Add);
    }
    table.EndLoadData();

    if (primaryKeyColumn != null)
    {
        // since we might have no real primary keys defined, the rows now might have repeating fields
        // so now we're going to "join" these rows ...
        var pkGroups = table.AsEnumerable()
            .GroupBy(r => r[primaryKeyColumn]);
        var dupGroups = pkGroups.Where(g => g.Count() > 1);
        foreach (var grpDup in dupGroups)
        { 
            // use first row and modify it
            DataRow firstRow = grpDup.First();
            foreach (DataColumn c in table.Columns)
            {
                if (firstRow.IsNull(c))
                {
                    DataRow firstNotNullRow = grpDup.Skip(1).FirstOrDefault(r => !r.IsNull(c));
                    if (firstNotNullRow != null)
                        firstRow[c] = firstNotNullRow[c];
                }
            }
            // remove all but first row
            var rowsToRemove = grpDup.Skip(1);
            foreach(DataRow rowToRemove in rowsToRemove)
                table.Rows.Remove(rowToRemove);
        }
    }

    return table;
}

You can call it in this way:

var tables = new[] { tblA, tblB, tblC };
DataTable TblUnion = tables.MergeAll("c1");

Used this sample data:

var tblA = new DataTable();
tblA.Columns.Add("c1", typeof(int));
tblA.Columns.Add("c2", typeof(int));
tblA.Columns.Add("c3", typeof(string));
tblA.Columns.Add("c4", typeof(char));

var tblB = new DataTable();
tblB.Columns.Add("c1", typeof(int));
tblB.Columns.Add("c5", typeof(int));
tblB.Columns.Add("c6", typeof(string));
tblB.Columns.Add("c7", typeof(char));

var tblC = new DataTable();
tblC.Columns.Add("c1", typeof(int));
tblC.Columns.Add("c8", typeof(int));
tblC.Columns.Add("c9", typeof(string));
tblC.Columns.Add("c10", typeof(char));

tblA.Rows.Add(1, 8500, "abc", 'A');
tblA.Rows.Add(2, 950, "cde", 'B');
tblA.Rows.Add(3, 150, "efg", 'C');
tblA.Rows.Add(4, 850, "ghi", 'D');
tblA.Rows.Add(5, 50, "ijk", 'E');

tblB.Rows.Add(1, 7500, "klm", 'F');
tblB.Rows.Add(2, 900, "mno", 'G');
tblB.Rows.Add(3, 150, "opq", 'H');
tblB.Rows.Add(4, 850, "qrs", 'I');
tblB.Rows.Add(5, 50, "stu", 'J');

tblC.Rows.Add(1, 7500, "uvw", 'K');
tblC.Rows.Add(2, 900, "wxy", 'L');
tblC.Rows.Add(3, 150, "yza", 'M');
tblC.Rows.Add(4, 850, "ABC", 'N');
tblC.Rows.Add(5, 50, "CDE", 'O');

After DataTable.Merge in MergeAll:

enter image description here

After some modifications to join the rows in MergeAll:

enter image description here


Update

Since this question arose in one of the comments, if the only relation between two tables is the index of a DataRow in the table and you want to merge both tables according to the index:

public static DataTable MergeTablesByIndex(DataTable t1, DataTable t2)
{
    if (t1 == null || t2 == null) throw new ArgumentNullException("t1 or t2", "Both tables must not be null");

    DataTable t3 = t1.Clone();  // first add columns from table1
    foreach (DataColumn col in t2.Columns)
    {
        string newColumnName = col.ColumnName;
        int colNum = 1;
        while (t3.Columns.Contains(newColumnName))
        {
            newColumnName = string.Format("{0}_{1}", col.ColumnName, ++colNum);
        }
        t3.Columns.Add(newColumnName, col.DataType);
    }
    var mergedRows = t1.AsEnumerable().Zip(t2.AsEnumerable(),
        (r1, r2) => r1.ItemArray.Concat(r2.ItemArray).ToArray());
    foreach (object[] rowFields in mergedRows)
        t3.Rows.Add(rowFields);

    return t3;
}

Sample:

var dt1 = new DataTable();
dt1.Columns.Add("ID", typeof(int));
dt1.Columns.Add("Name", typeof(string));
dt1.Rows.Add(1, "Jon");
var dt2 = new DataTable();
dt2.Columns.Add("Country", typeof(string));
dt2.Rows.Add("US");

var dtMerged = MergeTablesByIndex(dt1, dt2);

The result table contains three columns ID,Name,Country and a single row: 1 Jon US

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 2
    Best answer I have ever recived on this site hands down. In the end I have written the DataTables into a DataSet and itterated that into my visual container. But this is much better - thanks very much for your time... All the best. – MoonKnight Sep 05 '12 at 16:31
  • Of course I will! Just adding it in to the code base. I have tested it and it works great. Thanks again... – MoonKnight Sep 05 '12 at 17:16
  • How can i aggregate the integer based columns by using SUM fn in LINQ? In my case, the column set is same for all different datatables. Please suggest. – Karan May 08 '14 at 08:31
  • @Karan: if you have a question ask it and provide all necessary informations to reproduce the issue with your sample data. You can then post a link to that question here so i can try to help. – Tim Schmelter May 08 '14 at 08:54
  • @Tim. I posted my qstn over here http://stackoverflow.com/questions/23537162/aggregate-datatable-with-dynamic-number-of-columns – Karan May 08 '14 at 08:59
  • @Tim, Can we combine datatables without primary key?? Is it possible? – vinodh Feb 12 '16 at 11:16
  • @vinodh: that's exactly what above method does. If you have primary keys you just have to use `DataTable.Merge` – Tim Schmelter Feb 12 '16 at 11:24
  • @Tim.. For example Dt1 contains { id : 1, name : tim} Dt2 contains { country : US } can we merge like Dt3 { id 1, name tim, country US } – vinodh Feb 12 '16 at 12:11
  • @vinodh: you could ask a real question since my answer doesn't fit into a comment. Let me know if you've posted it. I have already an idea. – Tim Schmelter Feb 12 '16 at 12:37
  • , question is simple how to combine two different schema datatable @tim , i ll add in questions soon . Thanks for your reply – vinodh Feb 12 '16 at 12:54
  • @vinodh: remember to include the rule. I guess you want to merge both tables via index, correct? – Tim Schmelter Feb 12 '16 at 12:57