4

I would like to know how I can convert elements of a column of a DataTable to a list of type string, grouping the elements to avoid repetition.

For example my DataTable would look like this DataTable

and I want to make a list containing the elements of only "User" without repeating itself using LINQ.

The code I was trying to use is

InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString).ToList(Function(g) g.ToList())

But it doesn't work for me since I am new to LINQ and still have problems forming the structures.

Xvier
  • 41
  • 1

2 Answers2

4

I'd use this:

InvoiceList = InvoiceDT.AsEnumerable().Select(Function(r) r("User").ToString()).Distinct().ToList()

If you wanted a GroupBy solution it's

InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString()).Select(Function(g) g.Key).ToList()

Where your code went wrong was in trying to pass a delegate to ToList; it doesn't take one (and you wouldn't ToList the g either, as it's a list of data rows with all varying properties).

To reshape our IGrouping (something like a list of objects that all share the same Key, which is a property of the list that the IGrouping represents) produced by the groupby into a sequence of string Keys we Select the Key, and then ToList that


There is a lot of back and forthing between developers over things like ToList vs ToArray - some people universally use ToList because, for collections of an unknown number of elements, both list and array will grow and resize repeatedly in the same way but using ToArray requires one additional resizing step at the end to trim off any unused slots. Mostly that's trivial in terms of an overall performance consideration and should be weighed against the benefit of releasing the memory with the trim. Getting into finer details is way beyond the scope of this answer but you can read some huge blog posts about it.

I personally think it's more important to generate sensible code by calling the method that results in the relevant type depending on what you plan to do with it; I ToList if I need List functionality (add/insert/remove).. I prefer ToArray if an array suits the follow-on purposes (read/write/random access, no insert or delete), and if I'll only ever enumerate it I don't To... anything at all - I just ForEach the result of the query, which can give a bigger performance boost than anything else because it means I may not have to enumerate the entire set (if I stop early) or allocate memory all at once for doing so (if I'm writing to a socket or file)

On the use of ToString; it's worth avoiding if you think you'll fall into a pattern where you do it on every column just to get a string. If the column is already a string it's an acceptable way to get the object that DataRow.Item gives you, into a string. If the column is another type it's better to cast it:

  • DirectCast(r("Age"), Integer)
  • r.Field(Of Integer)("Age")

Thing is, it's verbose, and ugly, and intellisense doesn't help you out with writing Age or knowing it's an Int. LINQ in VB is bad enough for verbosity without pouring gas on that fire. If you're working with datatables of a known structure, it's a lot nicer if you make strongly typed ones:

  • Add a new file of type DataSet to your project
  • Open it so the design surface appears. In the properties grid call it something reasonable, such as AccountsDataSet
  • Right click, Add Table, call it Invoices
  • Right click the emppty table, Add Column, call it User

Then use it like:

Dim dt as new AccountsDataSet.InvoicesDataTable

Populate it like:

dt.AddInvoicesRow("John Smith", ... other properties here)

Query it like:

dt.Select(Function(r) r.User).Distinct()

Much nicer than accessing column names by string, and having them be objects that need casting..

Consider the dataset generator as a way to quickly, visually, create poco classes with named, typed properties

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • I would say prefer `ToList` unless you are using an API that requires an array, since `ToArray` has the potential to require an additional allocation over `ToList`. – NetMage Jun 22 '21 at 20:45
  • As you're already using LINQ to DataSet (and even if you weren't) I'd change `r("User").ToString()` to `r.Field(Of String)("User")`. – jmcilhinney Jun 23 '21 at 01:46
  • Yep. Personally, I'd have done a directcast* - I left it as ToString because in this car it's already string column, the OP is clearly familiar with it and it's relatively accessory to the task at hand, but it's a reasonable point that using ToString could encourage a person to fall into the trap of using it every time, even on non strings. In contrast to that, Field does encourage a casting behavior and its worth mentioning. *actually I'd have made a strongly typed DS as described so I wasn't stringly typing anything-faster setup, easier use, cleaner code, fewer bugs – Caius Jard Jun 23 '21 at 04:12
0

Try this

dim list as List(of string) = InvoiceDT.Rows.
    Cast(of DataRow)().
    Select(Function(r)  r("User").ToString()).
    Distinct().
    ToList()

Here you cast Row collection as IEnumerable(of DataRow), rest is trivial

T.S.
  • 18,195
  • 11
  • 58
  • 78
  • 1
    This is, to all intents and purposes, identical to an already posted answer – Caius Jard Jun 23 '21 at 04:46
  • @CaiusJard With all respect, the other answer uses `InvoiceDT.AsEnumerable()`, which I believe is an extra extension on `DataTable`. For example in fiddle it was available for VB by adding `System.Data.DataSetExtensions`. But for c# I did not need to add it. – T.S. Jun 23 '21 at 13:17