0

I've got a piece of code that needs to deal with a DataTable. The DataTable looks something like this:

  PartnerID    |    Partner Name    |    GroupID    |    Group Name    |    Description
------------------------------------------------------------------------------------------
      1        |     First Name     |       4       |    Group Name1   |   Foo
      2        |     Second Name    |       12      |    Group Name2   |    Bar
      3        |     Third Name     |       7       |    Group Name3   |    Hello
      3        |     Third Name     |       8       |    Group Name4   |    Hello World

Now what I am trying to accomplish is the performance of following SQL Statement:

SELECT DISTINCT PartnerID, Partner Name
FROM Table1

In C# using Linq with the expected output looking like this:

  PartnerID    |    Partner Name    |
-------------------------------------
      1        |    First Name      |
      2        |    Second Name     |
      3        |    Third Name      |

I have already seen following post:

LINQ query on a DataTable

And found it very unhelpful for my case, since all i really want to do is getting specified columns, but all the answers there display a solution that only seems to be working with where statements or selecting all columns by default.

My current code looks something like this right now:

     static void Main(string[] args)
     {
        DataTable fullTable = new DataTable();

        AddColumns(fullTable, "PartnerID", "Partner Name", "GroupID", "Group Name", "Description");

        fullTable.Rows.Add(1, "First Name", 4, "Group Name1", "Foo");
        fullTable.Rows.Add(2, "Second Name", 12, "Group Name2", "Bar");
        fullTable.Rows.Add(3, "Third Name", 7, "Group Name3", "Hello");
        fullTable.Rows.Add(3, "Third Name", 8, "Group Name4", "Hello World");

        var selectTwoCols = from arow in fullTable.AsEnumerable()
                            select arow; //how do i select specific columns from those rows?
        foreach (DataRow dataRow in selectTwoCols.Rows)
        {
            foreach (var item in dataRow.ItemArray)
            {
                Console.Write(item + " ");
            }
            Console.WriteLine();
        }

    }

    static void AddColumns(DataTable table, params string[] columnNames)
    {
        for (int i = 0; i < columnNames.Length; i++)
        {
            table.Columns.Add(columnNames[i]);
        }
    }

I'm open to using different classes aswell, although im still curious to know how this would be solved using DataTables in particular

Alan
  • 949
  • 8
  • 26
  • Why not just delete the columns you don't need? – DavidG Aug 14 '19 at 10:38
  • because this wouldnt make the result distinct, and is probably not the intended way of doing that, judging by the "design" of the DataTable class – Alan Aug 14 '19 at 10:38
  • Well I wouldn't be using `DataTable` in the first place, it's a very old way of doing things. – DavidG Aug 14 '19 at 10:39
  • In that case i'm more than glad to know how the current way of doing things look. Although im still curious about how to solve this using datatables – Alan Aug 14 '19 at 10:42
  • "The current way" would be to use domain objects (classes) to represent your data. Like a `Partner` class. – Magnus Aug 14 '19 at 10:50
  • Try following : DataTable newTable = fullTable.AsEnumerable().GroupBy(x => x.Field("PartnerID")).Select(x => x.First()).CopyToDataTable(); newTable.Columns.Remove("GroupID"); newTable.Columns.Remove("Group Name"); newTable.Columns.Remove("Description"); – jdweng Aug 14 '19 at 12:34

2 Answers2

4
fullTable
   .AsEnumerable()
   .Select(x => new 
                { 
                   PartnerID = x.Field<int>("PartnerID"), 
                   PartnerName = x.Field<string>("Partner Name")
                })
   .Distinct();

This will create an anonymous type with the two properties you want. You than apply a Distinct to remove the duplicates. Anonymous types handles GetHashCode and Equals for you which Distinct uses to identify duplicates.

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • i am getting invalid cast exceptions with the part "new {...}" – Alan Aug 14 '19 at 10:50
  • Is PartnerID a string? In that case change the type to string – Magnus Aug 14 '19 at 10:55
  • its not which is why i'm confused – Alan Aug 14 '19 at 11:58
  • Run `fullTable.Columns["PartnerID"].DataType;` to make sure. – Magnus Aug 14 '19 at 12:30
  • The code will not work because distinct will not work with an integer and a string. – jdweng Aug 14 '19 at 12:35
  • 1
    @jdweng Why do you think that? It is distinct on an anonymous class with two properties. It will work fine. – Magnus Aug 14 '19 at 12:37
  • Is there a method that compares integers with strings? Maybe what is happening under the hood is the integer is being converted to a string. There is a method to compares a list of strings and a method that compares a list of integers. But there isn't a method that compares a list of objects. – jdweng Aug 14 '19 at 13:39
  • 1
    @jdweng As I wrote in my answer, anonymous types will auto generate `GetHashCode` and `Equals` which `distinct` uses for comparison. You can verify this by doing: `new[] { new { a = 1, b = "str"},new { a = 1, b = "str"} }.Distinct()` which will return one object. – Magnus Aug 14 '19 at 14:01
  • @Alan What (data)type of column is created when you call your custom method `AddColumns`? I don't see you passing any data types in - does it create `object` columns? – NetMage Aug 14 '19 at 23:43
  • @NetMage if no datatype is specified when creating a data table column `string` is used. – Magnus Aug 15 '19 at 09:15
  • @Magnus But only @Alan knows what his custom method `AddColumns` does in that case. – NetMage Aug 15 '19 at 17:40
  • @Magnus is right, no specified datatype when adding columns results in the column using string as datatype – Alan Aug 19 '19 at 09:42
0

We've already got a good answer here, but i think this is a more "intended" way of doing this.

In addition to the DataTable class, C# offers a DataView class.

In that classes Documentation we can read the following:

Represents a databindable, customized view of a DataTable for sorting, filtering, searching, editing, and navigation.

So judging from this, I assume, that Microsoft intention for this class was to be used in combination with DataTable to filter out rows or columns and pretty much any of the operations mentioned in the documentation.

So I used

DataView view = new DataView(fullTable); 

DataTable twoColsDistinct = view.ToTable(true, "PartnerID", "Partner Name"); //distinct
DataTable twoColsNonDistinct = view.ToTable(false, "PartnerID", "Partner Name"); //not distinct

To get two DataTables with only the two required columns selected and printing those two datatables content results in following output:

two cols distinct
----------------------
1 First Name
2 Second Name
3 Third Name
----------------------
two cols non distinct
----------------------
1 First Name
2 Second Name
3 Third Name
3 Third Name

which is exactly what i needed. just selecting columns.

using following code results in the exact same output:

var query = fullTable
               .AsEnumerable()
               .Select(x => new
                   {
                       PartnerID = x.Field<string>("PartnerID"),
                       PartnerName = x.Field<string>("Partner Name")
                   }
               ).Distinct();

foreach(var t in query)
{
    Console.WriteLine(t.PartnerID + " " + t.PartnerName);
}

prints

1 First Name
2 Second Name
3 Third Name

although i personally prefer the more compact solution above because with the "longer" solution, we are creating custom objects (more precisely: anonymous types), with PartnerID and PartnerName becoming that objects instance variables and in my "simple" case, I was not dealing with any object oriented "stuff" to begin with.

I find the compact solution easier to read and understand.

Bonus:

We can use the same DataView object to create filtered DataTables aswell which may or may not be useful:

view.RowFilter = "PartnerID > 1";

DataTable partnerIdGreaterThanOne = view.ToTable(true);

Printing this DataTable outputs the following:

2 Second Name 12 Group Name2 Bar
3 Third Name 7 Group Name3 Hello
3 Third Name 8 Group Name4 Hello World

Not sure if using Linq is any better here. Just wanted to throw this out there.

EDIT:

I did some performance testing and what i found out was that using

var query = fullTable
            .AsEnumerable()
            .Select(x => new
                    {
                        PartnerID = x.Field<string>("PartnerID"),
                        PartnerName = x.Field<string>("Partner Name")
                    })
            .Distinct();

is A LOT faster than the DataView solution IF one does not need the additional DataTable object created on the dataview approach.

Alan
  • 949
  • 8
  • 26