4

I am struggling to get column A value by querying a column B for duplicates from database with Entity Framework Core. Example:

Column A | Column B
--------------------
A        | 1
B        | 1 
C        | 2 
D        | 2 
E        | 3 

I should get output:

A,B,C,D

Which I plan to use to populate a drop down list.

I’ve tried:

Options = _context.table.AsEnumerable()
.GroupBy(s => new{ s.columnA,s.columnB)
.Where(o => o.columnB.Count() > 1)
.Select(o => new SelectListItem
{
   Value = o.Key.columnA,
   Text = o.Key.columnB
}.ToList()

Vy Do
  • 46,709
  • 59
  • 215
  • 313
Aaron
  • 123
  • 1
  • 12
  • Your question is not clear. Either you use o.columnB.Count() > 1 or o.columnB > 1, you can not get A, B, C, D as output.. You should group data on columnB and then check count of group elements.`Options = _context.table.AsEnumerable().GroupBy(s => s.columnB).Where(o => o.Count() > 1).Select(o => new SelectListItem { Value = o.columnA, Text = o.columnB }.ToList()` – Niranjan Singh Sep 03 '20 at 05:52
  • Does this answer your question? [Linq with group by having count](https://stackoverflow.com/questions/2078736/linq-with-group-by-having-count) – Drag and Drop Sep 03 '20 at 06:12
  • @DragandDrop no, that solution only display for the column I am checking for duplicates (in this case, columnB). I need to display Column A items that has duplicates in column B. – Aaron Sep 03 '20 at 06:32
  • As Karan said, since you want to get the duplicates records based on the columnB, the GroupBy statement should like this `.GroupBy(s=>s.columnB)`, Then, you could use the SelectMany method to find the sub items, `Options = context.table.AsEnumerable().GroupBy(s => s.columnB).Where(o => o.Count() > 1).SelectMany(o => o).Select(d => new SelectListItem(){ Text = d.columnA, Value = d.columnB.ToString() }).ToList();` – Zhi Lv Sep 03 '20 at 08:35
  • Right, but `SelectMany()` uses too much memory. I've got quite a big database. – Aaron Sep 03 '20 at 09:46
  • @AaronYong `SelectMany` is not using much memory because it just flattens the found duplicates only. Too much memory usage is from client side evaluation of `GroupBy` (after `AsEnumerable()`). – Ivan Stoev Sep 03 '20 at 10:21

2 Answers2

3

You want to count on columnB only then you should be using .GroupBy(s => s.columnB). Update your Where & Select like below.

Note Need to use SelectMany wiht Select inside it because if we use Select instead of SelectMany it will return List<List<SelectListItem>>. SelectMany will flatten it and return List<SelectListItem>.

Options = _context.table.AsEnumerable()
            .GroupBy(s => s.columnB)
            .Where(o => o.Count() > 1)
            .SelectMany(o => o.Select(x => new SelectListItem
            {
               Value = x.columnA,
               Text = x.columnB
            }))
            .ToList();

Detail Explanation Credits to @Flater from helpful comment. The lambda in GroupBy is essentially the "group identifier". In the question, each group is defined by the combined uniqueness of columns A and B. But as per described into question it needs to combine things based on column B alone, with the goal to specifically group things regardless of what column A contains, so column A should not be included in the group's identifier, since the query should not make a new group when encountering a different column A value.

Karan
  • 12,059
  • 3
  • 24
  • 40
  • 1
    This is the correct answer, but I think it can benefit from a small elaboration on why the `GroupBy` needed to be changed: the lambda in `GroupBy` is essentially the "group identifier". In the question, each group is defined by the combined uniqueness of columns A _and_ B. But OP wants to combine things based on column B alone, with the goal to specifically group things _regardless_ of what column A contains, so column A should not be included in the group's identifier, since the query should **not** make a new group when encountering a different column A value. – Flater Sep 03 '20 at 11:59
  • @Flater, thanks for your helpful comment. I have updated answer as per your comment. – Karan Sep 03 '20 at 12:33
0

So you have combinations of [ColumnA, ColumnB], and if you want all values of ColumnA where there is more than one ColumnB.

My advice would be to convert your [ColumnA, ColumnB] into groups of "ColumnB, with all ColumnA that have this ColumnB. Then keep only the Groups that have more than one ColumnA.

So in your example:

  • 1 with values A, B
  • 2 with values C, D
  • 3 with value E

Throw away group 3 because it has only one element, and flatten the result.

We use the overload of GroupBy that has an elementSelector. The elementSelector will select columnA as elements.

var combinationsColumnAColumnB = ...
var result = combinationsColumnB.GroupBy(
     // parameter keySelector: make groups with same ColumnB
     combination => combination.ColumnB,

    // parameter elementselector: put only the ColumnA in each group
    combination => combination.ColumnA)

    // result: groups of columnB with all columnA that belong to this columnB
    // keep only those groups that have more than one element
    .Where(group => group.Skip(1).Any()

    // and flatten the result:
    .SelectMany(group => group);

If you have combinations like:

A   1
A   2
B   1
C   2

You will get element A twice. If you don't want that, add a Distinct()

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116