LINQ get column based on another column having count >1

Question

I am struggling to get column A value by querying a column B for duplicates from database with Entity Framework Core. Example:

Column A | Column B
--------------------
A        | 1
B        | 1 
C        | 2 
D        | 2 
E        | 3

I should get output:

A,B,C,D

Which I plan to use to populate a drop down list.

I’ve tried:

Options = _context.table.AsEnumerable()
.GroupBy(s => new{ s.columnA,s.columnB)
.Where(o => o.columnB.Count() > 1)
.Select(o => new SelectListItem
{
   Value = o.Key.columnA,
   Text = o.Key.columnB
}.ToList()

Your question is not clear. Either you use o.columnB.Count() > 1 or o.columnB > 1, you can not get A, B, C, D as output.. You should group data on columnB and then check count of group elements.`Options = _context.table.AsEnumerable().GroupBy(s => s.columnB).Where(o => o.Count() > 1).Select(o => new SelectListItem { Value = o.columnA, Text = o.columnB }.ToList()` — Niranjan Singh, Sep 03 '20 at 05:52
Does this answer your question? [Linq with group by having count](https://stackoverflow.com/questions/2078736/linq-with-group-by-having-count) — Drag and Drop, Sep 03 '20 at 06:12
@DragandDrop no, that solution only display for the column I am checking for duplicates (in this case, columnB). I need to display Column A items that has duplicates in column B. — Aaron, Sep 03 '20 at 06:32
As Karan said, since you want to get the duplicates records based on the columnB, the GroupBy statement should like this `.GroupBy(s=>s.columnB)`, Then, you could use the SelectMany method to find the sub items, `Options = context.table.AsEnumerable().GroupBy(s => s.columnB).Where(o => o.Count() > 1).SelectMany(o => o).Select(d => new SelectListItem(){ Text = d.columnA, Value = d.columnB.ToString() }).ToList();` — Zhi Lv, Sep 03 '20 at 08:35
Right, but `SelectMany()` uses too much memory. I've got quite a big database. — Aaron, Sep 03 '20 at 09:46
@AaronYong `SelectMany` is not using much memory because it just flattens the found duplicates only. Too much memory usage is from client side evaluation of `GroupBy` (after `AsEnumerable()`). — Ivan Stoev, Sep 03 '20 at 10:21

Karan · Accepted Answer · 2020-09-03T12:32:22.080

You want to count on columnB only then you should be using .GroupBy(s => s.columnB). Update your Where & Select like below.

Note Need to use SelectMany wiht Select inside it because if we use Select instead of SelectMany it will return List<List<SelectListItem>>. SelectMany will flatten it and return List<SelectListItem>.

Options = _context.table.AsEnumerable()
            .GroupBy(s => s.columnB)
            .Where(o => o.Count() > 1)
            .SelectMany(o => o.Select(x => new SelectListItem
            {
               Value = x.columnA,
               Text = x.columnB
            }))
            .ToList();

Detail Explanation Credits to @Flater from helpful comment. The lambda in GroupBy is essentially the "group identifier". In the question, each group is defined by the combined uniqueness of columns A and B. But as per described into question it needs to combine things based on column B alone, with the goal to specifically group things regardless of what column A contains, so column A should not be included in the group's identifier, since the query should not make a new group when encountering a different column A value.

This is the correct answer, but I think it can benefit from a small elaboration on why the `GroupBy` needed to be changed: the lambda in `GroupBy` is essentially the "group identifier". In the question, each group is defined by the combined uniqueness of columns A _and_ B. But OP wants to combine things based on column B alone, with the goal to specifically group things _regardless_ of what column A contains, so column A should not be included in the group's identifier, since the query should **not** make a new group when encountering a different column A value. — Flater, Sep 03 '20 at 11:59
@Flater, thanks for your helpful comment. I have updated answer as per your comment. — Karan, Sep 03 '20 at 12:33

score 0 · Answer 2 · answered Sep 03 '20 at 11:41

So you have combinations of [ColumnA, ColumnB], and if you want all values of ColumnA where there is more than one ColumnB.

My advice would be to convert your [ColumnA, ColumnB] into groups of "ColumnB, with all ColumnA that have this ColumnB. Then keep only the Groups that have more than one ColumnA.

So in your example:

1 with values A, B
2 with values C, D
3 with value E

Throw away group 3 because it has only one element, and flatten the result.

We use the overload of GroupBy that has an elementSelector. The elementSelector will select columnA as elements.

var combinationsColumnAColumnB = ...
var result = combinationsColumnB.GroupBy(
     // parameter keySelector: make groups with same ColumnB
     combination => combination.ColumnB,

    // parameter elementselector: put only the ColumnA in each group
    combination => combination.ColumnA)

    // result: groups of columnB with all columnA that belong to this columnB
    // keep only those groups that have more than one element
    .Where(group => group.Skip(1).Any()

    // and flatten the result:
    .SelectMany(group => group);

If you have combinations like:

You will get element A twice. If you don't want that, add a Distinct()

LINQ get column based on another column having count >1

2 Answers2