I have a data table I've loaded from a CSV file. I need to determine which rows are duplicates based on two columns (product_id
and owner_org_id
) in the datatable. Once I've determined that, I can use that information to build my result, which is a datatable containing only the rows that are not unique, and a data table containing only the rows that are unique.
I've looked at other examples on here and the code I've come up with so far does compile and execute, but it seems to think every row in the data is unique. In reality in the test data there's 13 rows and only 6 are unique. So clearly I'm doing something wrong.
EDIT: Thought I should note, rows that have duplicates should ALL be removed, not just the duplicates of that row. eg if there are 4 duplicates, all 4 should be removed not 3, leaving one unique row from the 4.
EDIT2: Alternatively, if I can select all duplicate rows (instead of trying to select unique rows) it is fine with me. Either way can get me to my end result.
The code in the processing method:
MyRowComparer myrc = new MyRowComparer();
var uniquerows = dtCSV.AsEnumerable().Distinct(myrc);
along with the following:
public class MyRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
//return ((string.Compare(x.Field<string>("PRODUCT_ID"), y.Field<string>("PRODUCT_ID"), true)) ==
// (string.Compare(x.Field<string>("OWNER_ORG_ID"), y.Field<string>("OWNER_ORG_ID"), true)));
return
x.ItemArray.Except(new object[] { x[x.Table.Columns["PRODUCT_ID"].ColumnName] }) ==
y.ItemArray.Except(new object[] { y[y.Table.Columns["PRODUCT_ID"].ColumnName] }) &&
x.ItemArray.Except(new object[] { x[x.Table.Columns["OWNER_ORG_ID"].ColumnName] }) ==
y.ItemArray.Except(new object[] { y[y.Table.Columns["OWNER_ORG_ID"].ColumnName] });
}
public int GetHashCode(DataRow obj)
{
int y = int.Parse(obj.Field<string>("PRODUCT_ID"));
int z = int.Parse(obj.Field<string>("OWNER_ORG_ID"));
int c = y ^ z;
return c;
}
}