3

I have a collection of DataRow objects. I should select distinct rows, based on the column 'URL_Link'. Following this post, I came up with below code.
Is it possible to apply it for DataRow collection?

IEnumerable<DataRow> results = GetData();  
results.GroupBy(row => row.Field<string>("URL_Link")).Select(grp => grp.First());

It is syntactically correct, but it does not solve the problem. It doesn't remove duplicate rows. What am I doing wrong?

Community
  • 1
  • 1
Null Head
  • 2,877
  • 13
  • 61
  • 83

2 Answers2

5

Except the minor error that you don't reassign the resultset to the result-variable.

Personaly I feel it much more clear to use a distinct, if you in fact should recieve the distinct values. Groupby is not really clear to use in such case, if return the whole row is intended, look at first sample below or else the second.

    class Program
    {
        static DataTable GetData()
        {
            DataTable table = new DataTable();
            table.Columns.Add("Visits", typeof(int));
            table.Columns.Add("URL_Link", typeof(string));

            table.Rows.Add(57, "yahoo.com");
            table.Rows.Add(130, "google.com");
            table.Rows.Add(92, "google.com");
            table.Rows.Add(25, "home.live.com");
            table.Rows.Add(30, "stackoverflow.com");
            table.Rows.Add(1, "stackoverflow.com");
            table.Rows.Add(7, "mysite.org");
            return table;
    }

    static void Main(string[] args)
    {
        var res = GetData()
                  .AsEnumerable()
                  .GroupBy(row => row.Field<string>("URL_Link"))
                  .Select(grp => grp.First());

        foreach (var item in res)
        {
            string text = "";
            foreach (var clm in item.ItemArray)
                text += string.Format("{0}\t", clm);

            Console.WriteLine(text);
        }
        Console.ReadLine();
    }
}

This is more or less exactly what you already provided. First of all you didn't re-assigned the variable. Then you should reach your fields from ItemArray. You see the sample above, which gave this output:

57    yahoo.com
130   google.com
25    home.live.com
30    stackoverflow.com
7     mysite.com

Please remember you may have to specify the Select, Orderby and Where clauses depends on your need of return a specific of those rows (i.e. the duplicate with most visits).

If URL_Link is the only field you need or want to return from a distinct result, this sample clear and stright forward. It just take a Select of the field you wan't, then distinct it.

    static void Main(string[] args)
    {
        var res = GetData()
                    .AsEnumerable()
                    .Select(d=>d.Field<string>("URL_Link"))
                    .Distinct();

        foreach (var item in res)
            Console.WriteLine(item.ToString());  

        Console.ReadLine();
    }
Independent
  • 2,924
  • 7
  • 29
  • 45
  • `Distinct()` was my initial idea too, but it returns only distinct values (columns) not rows, does it not? – abatishchev Jul 31 '12 at 09:07
  • 1
    `Distinct()` uses the `IEqualityComparer` interface to determine which items are alike. You could make your own implementation, and provide this with the 'Distinct()` call. This way you could call Distinct() on the complete dataset and get back your complete row, by still only comparing the 'URL_Link'. – Raxr Jul 31 '12 at 09:10
  • @abatishchev Ok, that's correct. Well, his on right track. I edit the reply. – Independent Jul 31 '12 at 09:15
3

The return from your LINQ operation isn't being assigned to anything:

IEnumerable<DataRow> results = GetData();  
results = results.GroupBy(row => row.Field<string>("URL_Link")).Select(grp => grp.First());
Dave New
  • 38,496
  • 59
  • 215
  • 394