I am looking for a solution to join two large datatables in Powershell. Each final table table will have ca. 1 mio rows. I spent some time to create a workaround via LINQ and an embedded C# code, but I am not sure if this is the most efficient way.
Here a demo code showing my custom datatable-join in action:
cls
Remove-Variable * -ea 0
$ErrorActionPreference = 'stop'
add-type -TypeDefinition @"
using System;
using System.Data;
using System.Data.DataSetExtensions;
using System.Linq;
namespace Linq {
public class DataTables {
public static DataTable Join(DataTable dt1, string column1, DataTable dt2, string column2){
DataTable dt3 = new DataTable();
Type ty = dt1.Columns[column1].GetType();
DataTable[] dta = {dt1, dt2};
for (int i=0; i<dta.Length; i++) {
string tableName = dta[i].TableName;
if (string.IsNullOrEmpty(tableName)) {tableName = "table" + i.ToString();};
foreach (DataColumn column in dta[i].Columns) {
string columnName = tableName + '/' + column.ColumnName;
dt3.Columns.Add(columnName, column.DataType);
}
}
return (
dt1.AsEnumerable().Join(
dt2.AsEnumerable(),
a => a[column1],
b => b[column2],
(a, b) => {
DataRow row = dt3.NewRow();
row.ItemArray = a.ItemArray.Concat(b.ItemArray).ToArray();
return row;
}
)
).CopyToDataTable();
}
}
}
"@ -ReferencedAssemblies 'System.Xml','System.Data','System.Data.DataSetExtensions'
#define customer table:
$table_customer = [System.Data.DataTable]::new('customer')
[void]$table_customer.Columns.Add('id', 'int')
[void]$table_customer.Columns.Add('name', 'string')
# define order table:
$table_order = [System.Data.DataTable]::new('order')
[void]$table_order.Columns.Add('id', 'int')
[void]$table_order.Columns.Add('customer_id', 'int')
[void]$table_order.Columns.Add('name', 'string')
# fill both tables:
$oId = 0
foreach($cId in (1..3)) {
[void]$table_customer.rows.Add($cId, "customer_$cId")
foreach($o in 1..3) {
$oId++
[void]$table_order.rows.Add($oId, $cId, "customer_$cId order_$o")
}
}
# join the tables:
$table_joined = [Linq.DataTables]::Join($table_customer, 'id', $table_order, 'customer_id')
$table_customer | ft -AutoSize
$table_order | ft -AutoSize
$table_joined | ft -AutoSize
Is there any build-in function for this that I have missed? I was playing with System.Data.DataRelation, but this seems to be more like a filter for a second table based on a single criteria from the first table. In case there is no better alternative, I am happy to share the above code with everyone.