-2

EDIT: Why is Union() not excluding duplicates like it should?

I should have read the documentation before asking the original question. I didn't because everytime I used Union() was on lists of objects that didn't override Equals() and GetHashCode(), so even if the value of the fields of each of my objects in the lists were the same, they would be inside the new list Union() created. At first it would seem as if Union() didn't exclude duplicates and that was what I believed was true. But Union() does, in fact, exclude duplicates. And not only duplicates in both lists, but also duplicates within the same list. If my objects don't override Equals() and GetHashCode() they are not compared by value and that means that they are not seem as duplicates.

This was the confusion that made me ask this question.

Once I create a new List using Union() and then Select() the fields explicitly, "T" would become an anonymous type, which is compared by value. This way, objects with the same value of fields would be seem as duplicates. That is what is causing Union() to behave differently (or rather appear to behave differently). It always excludes duplicates but not always a type is compared by value, so objects with the same value of fields may or may not be seem as duplicates. It depends on the implementation of your custom class.

I guess that should have been the question: Why is Union() not excluding duplicates like it should? (as we've seen, it's because my objects were not really duplicates). Is that right?

----------------------

Original Question: LINQ Union + Select is removing duplicates automatically. Why?

I've always thought that Union() in Linq would return all values from the two lists even if they are the same. But my code is removing duplicates from the first list when I use 'Select()' right after a Union().

Imagine the classic probability problem of ball extraction, where I have different containers and I extract some number of different balls from the containers.

I have two lists of BallExtraction. Each list shows me the Id of the ball, the Id of the container that the ball was in, the number of balls I have extracted (Value) and its Color. But, for some reason, I have two different lists and I want to merge them.

Example Code:

class BallExtraction
{
    public enum BallColor 
    {
        Blue = 0,
        Red = 1
    } 

    public int Id { get; set; }
    public int IdContainer { get; set; }
    public int ValueExtracted { get; set; }
    public BallColor Color { get; set; }

    public BallExtraction() { }

    public BallExtraction(int id, int idContainer, int valueExtracted, BallColor color)
    {
        this.Id = id;
        this.IdContainer = idContainer;
        this.ValueExtracted = valueExtracted;
        this.Color = color;
    }

}

And now I run the program that follows:

class Program
{
    static void Main(string[] args)
    {
        List<BallExtraction> list1 = new List<BallExtraction>();
        list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
        list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
        list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Red));
        list1.Add(new BallExtraction(1, 2, 70, BallExtraction.BallColor.Blue));
        list1.Add(new BallExtraction(2, 1, 10, BallExtraction.BallColor.Blue));

        List<BallExtraction> list2 = new List<BallExtraction>();
        list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Blue));
        list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Red));

        var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
        {
            Id = s.Id,
            IdContainer = s.IdContainer,
            ValueExtracted = s.ValueExtracted
        }).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
        {
            Id = s.Id,
            IdContainer = s.IdContainer,
            ValueExtracted = s.ValueExtracted
        }));

        Console.WriteLine("Number of items: {0}", mergedList.Count());

        foreach (var item in mergedList)
        {
            Console.WriteLine("Id: {0}. IdContainer: {1}. # of balls extracted: {2}", item.Id, item.IdContainer, item.ValueExtracted);
        }

        Console.ReadLine();

    }
}

The expected output is:

Number of items: 5
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.    

But the actual output is:

Number of items: 4
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.

Notice that the first list contains two extractions with the same values. The Id of the ball is 1, the Id of the container is 1, the number of balls extracted is 20 and they are both blue.

I found that when I switch the 'mergedList' to the code below, I get the expected output:

var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue));

So, it seems that the 'Select' used right after the Union() is removing the duplicates from the first list.

The real problem is that I don't actually have a list of a simple type like in the example but I have a list of IEnumerable< T > (T is an anonymous type) and T has a lot of fields. I only want specific fields but I want all the new anonymous type duplicates. The only workaround I have found is if in the 'Select()' I add some field that is unique to each object T.

Is this working as intended? Should Union + Select remove duplicates?

Daniel Marques
  • 683
  • 8
  • 17
  • 1
    Simply looking at the documentation for the method would tell you exactly what it does, and whether the behavior is intended or not. – Servy Sep 15 '16 at 14:26
  • `Union`, `Except` and `Intersect` all remove duplicates. – Dennis_E Sep 15 '16 at 14:27
  • It is not removing duplicates from two lists. It is removing duplicates from the first list. – Daniel Marques Sep 15 '16 at 14:33
  • 2
    "Behavior is exactly as documented. Why didn't I bother reading the documentation?" Only you can answer that. – 15ee8f99-57ff-4f92-890c-b56153 Sep 15 '16 at 14:38
  • I've just run across this question and found an interesting article which compares `Union`, `Intersect`, `Except` and `Distinct`. It also describes how to tell `Union` which fields it should compare in a collection using an `IEqualityComparer` which helped me: https://www.c-sharpcorner.com/article/use-of-unionintersect-and-except-in-linq/ – pbur Nov 27 '20 at 09:44

1 Answers1

6

Yes, it's the expected behaviour.

Union's doc states

Return Value Type: System.Collections.Generic.IEnumerable An IEnumerable that contains the elements from both input sequences, excluding duplicates.

To keep duplicates, you have to use Concat(), not Union()

Raphaël Althaus
  • 59,727
  • 6
  • 96
  • 122
  • Union does not exclude duplicates. I just tested it. – Daniel Marques Sep 15 '16 at 14:37
  • Please, run my code and you'll see. If I have duplicates on both lists, var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue)); will return both values. – Daniel Marques Sep 15 '16 at 14:39
  • 4
    @DanielMarques That's because `BallExtraction` doesn't override `Equals` and `GetHashCode` so it's comparing "duplicates" based on reference and not value. – juharr Sep 15 '16 at 14:41
  • When I use Select() to select the some of the fields explicitly, then they compare by value? – Daniel Marques Sep 15 '16 at 14:46
  • 3
    Yes. When you project to an anonymous type, that's what happen, see : http://stackoverflow.com/questions/12123512/why-anonymous-types-equals-implementation-compares-fields – Raphaël Althaus Sep 15 '16 at 14:47