I have a stored procedure that ETL's data from all over the place, and at the end attempting to Join it all together into one single large table. Talking about 100 columns wide, 35 million rows.
When I pull from the temp tables & join it all together the insert query can take hours, page to disk, etc. It's simply too large for my environment.
Insert Into tbl_huge
Select Distinct a, b, c, d, e, f, g, h, i, j, k
from (Mess of Subqueries & #tmp_tbls)
What's the best way to batch this insert to do the commits 100k rows or something at a time? There's no good natural key in the data that breaks this up even semi-evenly, and I'm worried about that being true going-forward anyway.
I've seen different examples using where not exists in the destination table, but this seems like the wrong approach that won't continue to scale & grow.
So what's the best approach here? Sort the result and re-do the Select & insert query multiple times in a while loop keeping a counter to know i need to do rows > x?
Is there a better approach, either that allows me to accurately select a subset to insert OR that pre-processes the select into memory (+page file due to size) and read it back to insert in chunks?