EDIT: Why is Union() not excluding duplicates like it should?
I should have read the documentation before asking the original question. I didn't because everytime I used Union() was on lists of objects that didn't override Equals() and GetHashCode(), so even if the value of the fields of each of my objects in the lists were the same, they would be inside the new list Union() created. At first it would seem as if Union() didn't exclude duplicates and that was what I believed was true. But Union() does, in fact, exclude duplicates. And not only duplicates in both lists, but also duplicates within the same list. If my objects don't override Equals() and GetHashCode() they are not compared by value and that means that they are not seem as duplicates.
This was the confusion that made me ask this question.
Once I create a new List using Union() and then Select() the fields explicitly, "T" would become an anonymous type, which is compared by value. This way, objects with the same value of fields would be seem as duplicates. That is what is causing Union() to behave differently (or rather appear to behave differently). It always excludes duplicates but not always a type is compared by value, so objects with the same value of fields may or may not be seem as duplicates. It depends on the implementation of your custom class.
I guess that should have been the question: Why is Union() not excluding duplicates like it should? (as we've seen, it's because my objects were not really duplicates). Is that right?
----------------------
Original Question: LINQ Union + Select is removing duplicates automatically. Why?
I've always thought that Union() in Linq would return all values from the two lists even if they are the same. But my code is removing duplicates from the first list when I use 'Select()' right after a Union().
Imagine the classic probability problem of ball extraction, where I have different containers and I extract some number of different balls from the containers.
I have two lists of BallExtraction. Each list shows me the Id of the ball, the Id of the container that the ball was in, the number of balls I have extracted (Value) and its Color. But, for some reason, I have two different lists and I want to merge them.
Example Code:
class BallExtraction
{
public enum BallColor
{
Blue = 0,
Red = 1
}
public int Id { get; set; }
public int IdContainer { get; set; }
public int ValueExtracted { get; set; }
public BallColor Color { get; set; }
public BallExtraction() { }
public BallExtraction(int id, int idContainer, int valueExtracted, BallColor color)
{
this.Id = id;
this.IdContainer = idContainer;
this.ValueExtracted = valueExtracted;
this.Color = color;
}
}
And now I run the program that follows:
class Program
{
static void Main(string[] args)
{
List<BallExtraction> list1 = new List<BallExtraction>();
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Red));
list1.Add(new BallExtraction(1, 2, 70, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(2, 1, 10, BallExtraction.BallColor.Blue));
List<BallExtraction> list2 = new List<BallExtraction>();
list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Red));
var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
{
Id = s.Id,
IdContainer = s.IdContainer,
ValueExtracted = s.ValueExtracted
}).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
{
Id = s.Id,
IdContainer = s.IdContainer,
ValueExtracted = s.ValueExtracted
}));
Console.WriteLine("Number of items: {0}", mergedList.Count());
foreach (var item in mergedList)
{
Console.WriteLine("Id: {0}. IdContainer: {1}. # of balls extracted: {2}", item.Id, item.IdContainer, item.ValueExtracted);
}
Console.ReadLine();
}
}
The expected output is:
Number of items: 5
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.
But the actual output is:
Number of items: 4
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.
Notice that the first list contains two extractions with the same values. The Id of the ball is 1, the Id of the container is 1, the number of balls extracted is 20 and they are both blue.
I found that when I switch the 'mergedList' to the code below, I get the expected output:
var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue));
So, it seems that the 'Select' used right after the Union() is removing the duplicates from the first list.
The real problem is that I don't actually have a list of a simple type like in the example but I have a list of IEnumerable< T > (T is an anonymous type) and T has a lot of fields. I only want specific fields but I want all the new anonymous type duplicates. The only workaround I have found is if in the 'Select()' I add some field that is unique to each object T.
Is this working as intended? Should Union + Select remove duplicates?