0

I am looking for a solution to join two large datatables in Powershell. Each final table table will have ca. 1 mio rows. I spent some time to create a workaround via LINQ and an embedded C# code, but I am not sure if this is the most efficient way.

Here a demo code showing my custom datatable-join in action:

cls
Remove-Variable * -ea 0
$ErrorActionPreference = 'stop'

add-type -TypeDefinition @"
    using System;
    using System.Data;
    using System.Data.DataSetExtensions;
    using System.Linq;
    namespace Linq {
        public class DataTables {
            public static DataTable Join(DataTable dt1, string column1, DataTable dt2, string column2){
                DataTable dt3 = new DataTable();
                Type ty = dt1.Columns[column1].GetType();
                DataTable[] dta = {dt1, dt2};
                for (int i=0; i<dta.Length; i++) {
                    string tableName = dta[i].TableName;
                    if (string.IsNullOrEmpty(tableName)) {tableName = "table" + i.ToString();};
                    foreach (DataColumn column in dta[i].Columns) {
                        string columnName = tableName + '/' + column.ColumnName;
                        dt3.Columns.Add(columnName, column.DataType);
                    }
                }
                return (
                    dt1.AsEnumerable().Join(
                        dt2.AsEnumerable(), 
                        a => a[column1], 
                        b => b[column2],
                        (a, b) => {
                            DataRow row = dt3.NewRow();
                            row.ItemArray = a.ItemArray.Concat(b.ItemArray).ToArray();
                            return row;
                        }
                    )
                ).CopyToDataTable();
            }
        }
    }
"@ -ReferencedAssemblies 'System.Xml','System.Data','System.Data.DataSetExtensions'

#define customer table:
$table_customer = [System.Data.DataTable]::new('customer')
[void]$table_customer.Columns.Add('id', 'int')
[void]$table_customer.Columns.Add('name', 'string')

# define order table:
$table_order = [System.Data.DataTable]::new('order')
[void]$table_order.Columns.Add('id', 'int')
[void]$table_order.Columns.Add('customer_id', 'int')
[void]$table_order.Columns.Add('name', 'string')

# fill both tables:
$oId = 0
foreach($cId in (1..3)) {
    [void]$table_customer.rows.Add($cId, "customer_$cId")
    foreach($o in 1..3) {
        $oId++
        [void]$table_order.rows.Add($oId, $cId, "customer_$cId order_$o")
    } 
}

# join the tables:
$table_joined = [Linq.DataTables]::Join($table_customer, 'id', $table_order, 'customer_id')

$table_customer | ft -AutoSize
$table_order    | ft -AutoSize
$table_joined   | ft -AutoSize

Is there any build-in function for this that I have missed? I was playing with System.Data.DataRelation, but this seems to be more like a filter for a second table based on a single criteria from the first table. In case there is no better alternative, I am happy to share the above code with everyone.

Carsten
  • 1,612
  • 14
  • 21
  • What's the reason that you're doing this with a Datatable? Is this really going to be the source of your data? If you do have a database in the background why not let it do the join? If you are using Linq shouldn't you be able to just use a regular join? [This question](https://stackoverflow.com/questions/665754/inner-join-of-datatables-in-c-sharp) is pretty similar. – Seth Aug 16 '21 at 07:50
  • I have both datatables coming from an XML-response to a REST-API. As usual in large environments the data consumer is not the database admin here. – Carsten Aug 16 '21 at 07:56
  • @Seth The link you posted is not related to Powershell. In addition it is neither a function nor does it work for any given two tables - this is more a single sample how the LINQ-join may be used in your own code. – Carsten Aug 16 '21 at 08:06
  • You might have a look at [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/a/45483110/1701026) and the PowerShell request: [`#14994` Add a Join-Object cmdlet to the standard PowerShell equipment](https://github.com/PowerShell/PowerShell/issues/14994) – iRon Aug 16 '21 at 09:09
  • @Carsten your whole example is not based on PowerShell but rather using .NET in PowerShell. As such .NET examples are applicable and the same stuff that applies to .NET applies here. – Seth Aug 16 '21 at 09:37

0 Answers0