Edit: I will add some benchmark results. To about a 1000 - 5000 items in the list, IList
and RemoveAt
beats ISet
and Remove
, but that's not something to worry about since the differences are marginal. The real fun begins when collection size extends to 10000 and more. I'm posting only those data
I was answering a question here last night and faced a bizarre situation.
First a set of simple methods:
static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
return rnd.Next(source.Count);
}
public static T GetRandom<T>(this IList<T> source)
{
return source[source.GetRandomIndex()];
}
------------------------------------------------------------------------------------------------------------------------------------
Let's say I'm removing N number of items from a collection randomly. I would write this function:
public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
var inList = source.ToList();
int i = 0;
while (source.Count > countToRemain)
{
source.Remove(inList.GetRandom());
i++;
}
}
or
public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
int j = 0;
while (source.Count > countToRemain)
{
source.RemoveAt(source.GetRandomIndex());
j++;
}
}
As you can see the first function is written for an ISet
and the second for normal IList
. In the first function I'm removing by item from ISet
and by index in IList
, both of which I believe are O(1)
. Why is the second function performing so much worse than the first, especially when the lists get bigger?
Odds (my take):
1) In the first function the ISet
is converted to an IList
(to get the random item from the IList
), where as there is no such thing performed in the second function.
Advantage IList.
2) In the first function a call to GetRandomItem
is made, where as in the second, a call to GetRandomIndex
is made, that's one step less again.
Though trivial, advantage IList.
3) In the first function, the random item is got from a separate list, so the obtained item might be already removed from ISet
. This leads in more iterations in the while
loop in the first function. In the second function, the random index is got from the source that is being iterated on, hence there are never repetitive iterations. I have tested this and verified this.
i > j always, advantage
IList
.
I thought the reason for this behaviour is that a List
would need constant resizing when items are added or removed. But apparently no in some other testing. I ran:
public static void Remove1(this ISet<int> set)
{
int count = set.Count;
for (int i = 0; i < count; i++)
{
set.Remove(i + 1);
}
}
public static void Remove2(this IList<int> lst)
{
for (int i = lst.Count - 1; i >= 0; i--)
{
lst.RemoveAt(i);
}
}
and found that the second function runs faster.
Test bed:
var f = Enumerable.Range(1, 100000);
var s = new HashSet<int>(f);
var l = new List<int>(f);
Benchmark(() =>
{
//some examples...
s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);
s.Remove1();
l.Remove2();
}, 1);
public static void Benchmark(Action method, int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < iterations; i++)
method();
sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}
Just trying to know what's with the two structures.. Thanks..
Result:
var f = Enumerable.Range(1, 10000);
s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms
var f = Enumerable.Range(1, 100000);
s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms
var f = Enumerable.Range(1, 1000000);
s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms
For most typical needs a list would do though..!