I have data retrieved from one SQL table where the value that will be my PK in the table I need to import it to can come from one of 2 fields.
This does mean I can get duplicates so I need to obviously remove them before ingesting into the destination table.
I also need to mark the duplicate records that I have not ingested as processed so they don't get picked up again so I collect the "ID" value (will be unique) from the source table for the duplicate rows in an array. Currently I have the code below working, just worried it might be slow on large datasets (I'm expecting probably around 50-60K rows at max).
If ds.Tables(0).Rows.Count > 0 Then
Dim DupeID_List = Nothing
Dim DupeID_Count As Integer = 0
Dim hTable As New Hashtable()
Dim duplicateList As New ArrayList()
For Each drow__1 As DataRow In ds.Tables(0).Rows
If hTable.Contains(drow__1("EIBItemID")) Then
ReDim Preserve DupeID_List(DupeID_Count)
duplicateList.Add(drow__1)
DupeID_List(DupeID_Count) = CStr(drow__1("ID"))
DupeID_Count = DupeID_Count + 1
Else
hTable.Add(drow__1("EIBItemID"), String.Empty)
End If
Next
For Each dRow__2 As DataRow In duplicateList
ds.Tables(0).Rows.Remove(dRow__2)
Next
If Not DupeID_List Is Nothing Then
Call MarkDupeRecordsExported(DupeID_List)
End If
Return ds
Else
Return Nothing
End If
Is there a better way of achieving the same goal, check for duplicate "EIBItemID" values and remove them from the dataset that will be bulkcopied to the destination table but update the source table for each record (ID) that is removed?