Assuming you have some sort of IEnumerable<int>
you can do the following:
int count = stream.Distinct().Count();
Do I understand correctly that set membership is based on GetHashCode()
Not quite. Membership in a HashSet
is based on a combination of GetHashCode
and an equality check. In general, two objects can have the same hashcode but not be equal. Though for int
that cannot happen.
and that the hash code of an Int32 is the number itself?
Yes, that is correct.
Is there an approach that is either more CPU or more memory efficient?
If you know that your ints will be in a small range, you can efficiently store which you have seen by using a bitmap. For example, if you have a range of 1,000,000 you can store which ints you have seen in 1,000,000 bits. A bit set to 1 at index n means that you have seen the integer n. Here's some example code showing one way to implement this:
void Main()
{
int max = 1000000;
IEnumerable<int> stream = GetStream(max);
int count = DistinctCount(stream, max);
int count2 = stream.Distinct().Count();
Debug.Assert(count == count2);
}
int DistinctCount(IEnumerable<int> stream, int max)
{
int[] seen = new int[max / 32];
foreach (int x in stream)
{
seen[x / 32] |= 1 << (x % 32);
}
int count = 0;
foreach (uint s in seen)
{
uint t = s;
while (t > 0)
{
if (t % 2 == 1) { count++; }
t /= 2;
}
}
return count;
}
IEnumerable<int> GetStream(int max)
{
List<int> stream = new List<int>();
Random random = new Random();
for (int i = 0; i < 2000000; ++i)
{
stream.Add(random.Next(max));
}
return stream;
}