This question is strictly DQS-performance related.
The ‘customers’ table I need to clean has 40,000,000 rows… I created a matching policy using a subset (no issues there, I just used a top 10,000).
Now when I want to do a data quality project… I can’t take the entire table in one project… It just won’t respond… I only managed to handle 400,000 at a time and even in that situation it takes almost 2 hours… And it’s not the best solution, because I need to do the project on a view where id between 1 and 400,000.
Any solution to this guys?
I am also wondering… where's the bottleneck? is it CPU or disk?
Regards.