I have incoming data that is in DataTable form. There are no static classes to fall back on. I have a 2 tables, customer and billing. There are 7000 customers, 1200 billing records.
All customer records have a "ResponsiblePartyID", multiple customers can have the same ID, which references the ID of the billing table.
DataTable customer= ETL.ParseTable("customer"); // 7000 records
DataTable billing= ETL.ParseTable("billing"); // 1200 records
var JoinedTables = (from c in customer.AsEnumerable()
join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
from p in ps.DefaultIfEmpty()
select new {c, p}
);
So this doesn't work as it should, even if it spit out the results in the wrong format I would be happy, but it only returns 2200 results rather than the 7000.
It seems like it would make sense if it returned only 1200, or if it returned all 7000, but 2200 is a weird place for it to stop.
I am manually parsing binary data as my datasource, I chose a DataTable as the destination because it seemed like the right way, but after dealing with Linq and trying to do joins I am wondering if I should rethink things.
It seems that Linq wasn't designed to query DataTables since I have to do the .AsEnumerable()
on everything, and then .CopyToDataTable()
as I get done with each step.
I don't have static classes defined for all my data because the properties of each value was defined already in the DataTable, so what is the "right" way of taking 2 DataTables, doing a LEFT JOIN (like in SQL) where the results on the left aren't excluded by the results on the right? If I start off with a table on the left with 7000 rows, I want to end up with 7000. If there are no matching records, fill it in with null.
I would like to not have to define every column, it should return a flattened Array / DataTable - something like this:
var JoinedTables = (from c in customer.AsEnumerable()
join p in billing.AsEnumerable() on (string) c["ResponsiblePartyID"] equals (string) p["ID"] into ps
from p in ps.DefaultIfEmpty()
select ALL_COLUMNS
);
UPDATE:
I used the sample from Jon Skeet's answer that was linked in the comments (Linq return all columns from all tables in the join) His solution really wasn't any different than my first attempt, it still doesn't address how to flatten the results into a single DataTable. Here is a sample of the data and the current output:
Customers
ID Resp_ID Name
1 1 Fatafehi
2 2 Dan
3 1 Anthony
4 1 Sekona
5 1 Osotonu
6 6 Robert
7 1 Lafo
8 1 Sarai
9 9 Esteban
10 10 Ashley
11 11 Mitch
12 64 Mark
13 11 Shawn
14 53 Kathy
15 53 Jasmine
16 16 Aubrey
17 17 Peter
18 18 Eve
19 19 Brenna
20 20 Shanna
21 21 Andrea
Billing
ID 30_Day 60_Day
2 null null
6 null null
9 null null
10 null null
11 null null
64 null null
53 null null
16 null null
17 null null
18 null null
19 null null
20 -36.52 null
21 1843.30 null
Output:
2 2 Dan 2 null null
6 6 Robert 6 null null
9 9 Esteban 9 null null
10 10 Ashley 10 null null
11 11 Mitch 11 null null
12 64 Mark 64 -131.20 null
13 11 Shawn 11 null null
14 53 Kathy 53 null null
15 53 Jasmine 53 null null
16 16 Aubrey 16 null null
17 17 Peter 17 null null
18 18 Eve 18 null null
19 19 Brenna 19 null null
20 20 Shanna 20 -36.52 null
21 21 Andrea 21 1843.30 null
Notice that anyone with Resp_ID of 1 is missing from the results. To show output I used the following and then inserted the null
values for visualization:
foreach (var row in joinedRows)
{
Console.WriteLine(row.r1["ID"] + " " + row.r1["Resp_ID"] + " " + row.r1["Name"] + " " + row.r2["ID"] + " " + row.r2["30_Day"] + " " + row.r2["60_Day"]);
}