30

I try to remove the duplicate rows by select a first row from every group. For Example

PK     Col1     Col2
1        A        B
2        A        B
3        C        C
4        C        C

I want a return:

PK     Col1     Col2
1        A        B
3        C        C

I tried following code but it didn't work:

DataTable dt = GetSampleDataTable(); //Get the table above.
dt = dt.Select("SELECT MIN(PK), Col1, Col2 GROUP BY Col1, Col2);
J - C Sharper
  • 1,567
  • 7
  • 23
  • 36

5 Answers5

62

DataTable's Select method only supports simple filtering expressions like {field} = {value}. It does not support complex expressions, let alone SQL/Linq statements.

You can, however, use Linq extension methods to extract a collection of DataRows then create a new DataTable.

dt = dt.AsEnumerable()
       .GroupBy(r => new {Col1 = r["Col1"], Col2 = r["Col2"]})
       .Select(g => g.OrderBy(r => r["PK"]).First())
       .CopyToDataTable();
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • 1
    What assembly reference do you need to use in order to get this to work. I am using System.data but the GroupBy command is stating I am missing an assembly reference. – user5013 Feb 22 '16 at 20:15
  • What extension is this .CopyToDataTable() ? – Sebastian Widz Nov 23 '16 at 08:20
  • 2
    @SebastianWidz [`System.Data.DataSetExtensions`](https://msdn.microsoft.com/en-us/library/bb396189(v=vs.110).aspx) – D Stanley Nov 23 '16 at 14:47
  • @user5013 [`GroupBy`](https://msdn.microsoft.com/en-us/library/bb534501(v=vs.110).aspx) is in the System.Linq namespace. You need a reference to `System.Core` if you don't have one already. – D Stanley Nov 23 '16 at 14:48
  • How to use having clause in this? – Vahid Jul 26 '23 at 08:06
  • If you mean the equivalent of a SQL "HAVING", that gets applied to the _groups_, so you would add a `Where` call after the `GroupBy` that filters the groups (e.g. by count, or sum, or something else) – D Stanley Jul 26 '23 at 14:32
8
dt = dt.AsEnumerable().GroupBy(r => r.Field<int>("ID")).Select(g => g.First()).CopyToDataTable();
J - C Sharper
  • 1,567
  • 7
  • 23
  • 36
7
dt.AsEnumerable()
    .GroupBy(r => new { Col1 = r["Col1"], Col2 = r["Col2"] })
    .Select(g =>
    {
        var row = dt.NewRow();

        row["PK"] = g.Min(r => r.Field<int>("PK"));
        row["Col1"] = g.Key.Col1;
        row["Col2"] = g.Key.Col2;

        return row;

    })
    .CopyToDataTable();
esc
  • 228
  • 3
  • 5
0

This solution sort by Col1 and group by Col2. Then extract value of Col2 and display it in a mbox.

var grouped = from DataRow dr in dt.Rows orderby dr["Col1"] group dr by dr["Col2"];
string x = "";
foreach (var k in grouped) x += (string)(k.ElementAt(0)["Col2"]) + Environment.NewLine;
MessageBox.Show(x);
-1

Based on @Alfred Wallace's solution :

        DataTable dt = new DataTable();
        dt.Columns.Add("Col1");
        dt.Columns.Add("Col2");

        dt.Rows.Add("120", "34");
        dt.Rows.Add("121", "34");
        dt.Rows.Add("122", "34");

        dt.Rows.Add("1", "345");
        dt.Rows.Add("2", "345");
        dt.Rows.Add("3", "345");

        var grouped = from DataRow dr in dt.Rows orderby dr["Col1"] group dr by dr["Col2"];
        string xxx = "", yyy = "";
        foreach (var k_group in grouped)
        {                
            xxx += (string)(k_group.ElementAt(0)["Col1"]) + Environment.NewLine;
            foreach (DataRow item_dr in k_group)
            {                    
                yyy += (string)(item_dr["Col1"]) + Environment.NewLine;
                //  or use WhatEverMethod(item_dr);
            }
            var zzz = k_group.Max(g => g["Col1"]);
            var qqq = k_group.Key;
        }
fAndreiC
  • 1
  • 2