I have duplicate rows in data coming from excel sheet. In the SSIS package, I am using Sort transformation where sorting is done in ascending order by the primary key column ID. But before removing the duplicates I want to see if the email column has email with my company's domain. If so, I want other rows removed than the one having this type of email addresses. What should I do? Please refer to the image attached below.
In the data above, I want to remove two rows of John where email address are john@gmail.com. In Maria's case, I want to remove two rows having email addresses maria@gmail.com, hence preserving rows having email addresses of the domain mycompany.com. If there are multiple rows for a user having email addresses of the domain mycompany.com, I want to keep any one row with the domain email address.
Suggest please.