Delete duplicate rows from datatable based on 2 columns in c#

Question

How do I delete the duplicate rows from the datatable where there are combinations of same name and dept combinations? I need to keep one entry.

DataTable dt = new DataTable();
dt.Columns.Add("id");
dt.Columns.Add("Name");
dt.Columns.Add("Dept");

dt.Rows.Add(1, "Test1", "Sample1");
dt.Rows.Add(2, "Test2", "Sample2");
dt.Rows.Add(3, "Test3", "Sample3");
dt.Rows.Add(4, "Test4", "Sample4");  // Duplicate 
dt.Rows.Add(5, "Test4", "Sample4");  // Duplicate 
dt.Rows.Add(6, "Test4", "Sample4");  // Duplicate 
dt.Rows.Add(7, "Test4", "Sample5");

Result data table should be,

dt.Rows.Add(1, "Test1", "Sample1");
dt.Rows.Add(2, "Test2", "Sample2");
dt.Rows.Add(3, "Test3", "Sample3");
dt.Rows.Add(4, "Test4", "Sample4");  
dt.Rows.Add(6, "Test4", "Sample5");

How can I do this in c#

A "silly" algorithm would be to loop over each row, create a copy, store it in a list if not stored yet. Before storing you do your comparison. Your list will have no duplicates. The bigger the table, the slower this algorithm will be. — xmashallax, Apr 07 '20 at 22:37
I think the answer to this question may be what you need: https://stackoverflow.com/questions/3242892/select-distinct-rows-from-datatable-in-linq — Mark Arend, Apr 07 '20 at 23:12

score 2 · Accepted Answer · answered Apr 07 '20 at 23:21

Simple

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            dt.Columns.Add("id", typeof(int));
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Dept", typeof(string));

            dt.Rows.Add(1, "Test1", "Sample1");
            dt.Rows.Add(2, "Test2", "Sample2");
            dt.Rows.Add(3, "Test3", "Sample3");
            dt.Rows.Add(4, "Test4", "Sample4");  // Duplicate 
            dt.Rows.Add(5, "Test4", "Sample4");  // Duplicate 
            dt.Rows.Add(6, "Test4", "Sample4");  // Duplicate 
            dt.Rows.Add(7, "Test4", "Sample5");

            DataTable dt2 = dt.AsEnumerable()
                .OrderBy(x => x.Field<int>("id"))
                .GroupBy(x => new { name = x.Field<string>("Name"), dept = x.Field<string>("Dept") })
                .Select(x => x.First())
                .CopyToDataTable();

        }
    }
}

I'm using .net core where I don't have AsEnumerable() extension — blue, Apr 07 '20 at 23:28
The following article from Oct 2018 (before core 3.0) says core has linq. Did you try to put a "using System.Linq" at top of module? https://learn.microsoft.com/en-us/dotnet/csharp/tutorials/working-with-linq — jdweng, Apr 08 '20 at 02:51

score 2 · Answer 2 · answered Jun 25 '21 at 07:33

Here is a function that I got from someone, somewhere:

Usage:

List<string> columnName = new List<string> { "ID", "coulmn1", "coulmn_2", "Another", "however_many_columns_you_want_really" };
dataGrid = RemoveDuplicatesFromDataTable(dataGrid, columnName);

Function:

static DataTable RemoveDuplicatesFromDataTable(DataTable table, List<string> keyColumns)
{
    Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table.Rows.Count);
    StringBuilder stringBuilder = null;
    int rowIndex = 0;
    DataRow row;
    DataRowCollection rows = table.Rows;
    while (rowIndex < rows.Count - 1)
    {
        row = rows[rowIndex];
        stringBuilder = new StringBuilder();
        foreach (string colname in keyColumns)
        {
            //stringBuilder.Append(((double)row[colname]));
            stringBuilder.Append(row[colname]);
        }
        if (uniquenessDict.ContainsKey(stringBuilder.ToString()))
        {
            rows.Remove(row);
        }
        else
        {
            uniquenessDict.Add(stringBuilder.ToString(), string.Empty);
            rowIndex++;
        }
    }

    return table;
}

Very good solution, I just add my two cents. `while (rowIndex < rows.Count - 1)` will return 2 lines with same values in the DT , while if you want to have only one (unique) row, correct code is `while (rowIndex < rows.Count )` — Lorenzo Bassetti, Oct 29 '21 at 20:38

Delete duplicate rows from datatable based on 2 columns in c#

2 Answers2