30

In answer to the following question: How to convert MatchCollection to string array

Given The two Linq expressions:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .OfType<Match>() //OfType
    .Select(m => m.Groups[0].Value)
    .ToArray();

and

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .Cast<Match>() //Cast
    .Select(m => m.Groups[0].Value)
    .ToArray();

OfType<> was benchmarked by user Alex to be slightly faster (and confirmed by myself).

This seems counterintuitive to me, as I'd have thought OfType<> would have to do both an 'is' comparison, and a cast (T).

Any enlightenment would be appreciated as to why this is the case :)

Community
  • 1
  • 1
Dave Bish
  • 19,263
  • 7
  • 46
  • 63
  • 2
    I don't think this is a duplicate as the question is not _what_ the difference is, but why one is slower than the other. – C.Evenhuis Jul 11 '12 at 10:30
  • @danbystrom In [an answer](http://stackoverflow.com/a/4015976/11683) to the question you linked to Ash explains `OfType` is slower. The OP asks why it's actually faster when all elements of the sequence are in fact of that type. – GSerg Jul 11 '12 at 10:31
  • @C.Evenhuis: This is not even a real question... – leppie Jul 11 '12 at 10:34
  • 4
    Why would it do `is` then a cast, when it can just use `as` and a null check? – Damien_The_Unbeliever Jul 11 '12 at 10:36
  • I'm not sure! Which of those would be faster? - The question still holds, however... – Dave Bish Jul 11 '12 at 10:38
  • @Damien_The_Unbeliever I don't know. Maybe the Ash's answer is wrong. I only mention it because danbystrom linked to that question as a duplicate. **Which it is not, please stop voting for close.** – GSerg Jul 11 '12 at 10:38
  • 2
    this is NOT a duplicate. – Dave Bish Jul 11 '12 at 10:45
  • 4
    Your answer lies here [Cast and OfType :yes Jon Skeet](http://msmvps.com/blogs/jon_skeet/archive/2011/01/13/reimplementing-linq-to-objects-part-33-cast-and-oftype.aspx) – V4Vendetta Jul 11 '12 at 10:49
  • This question may be worth revisiting for C# 7 due to the `is` operator patterns? Though I suppose that depends on whether the methods have been kept up to date. – IAmJersh Oct 25 '18 at 10:57
  • @V4Vendetta that link is broken. This is the new URL: https://codeblog.jonskeet.uk/2011/01/13/reimplementing-linq-to-objects-part-33-cast-and-oftype/ – David Klempfner Dec 01 '19 at 00:44

4 Answers4

16

My benchmarking does not agree with your benchmarking.

I ran an identical benchmark to Alex's and got the opposite result. I then tweaked the benchmark somewhat and again observed Cast being faster than OfType.

There's not much in it, but I believe that Cast does have the edge, as it should because its iterator is simpler. (No is check.)

Edit: Actually after some further tweaking I managed to get Cast to be 50x faster than OfType.

Below is the code of the benchmark that gives the biggest discrepancy I've found so far:

Stopwatch sw1 = new Stopwatch();
Stopwatch sw2 = new Stopwatch();

var ma = Enumerable.Range(1, 100000).Select(i => i.ToString()).ToArray();

var x = ma.OfType<string>().ToArray();
var y = ma.Cast<string>().ToArray();

for (int i = 0; i < 1000; i++)
{
    if (i%2 == 0)
    {
        sw1.Start();
        var arr = ma.OfType<string>().ToArray();
        sw1.Stop();
        sw2.Start();
        var arr2 = ma.Cast<string>().ToArray();
        sw2.Stop();
    }
    else
    {
        sw2.Start();
        var arr2 = ma.Cast<string>().ToArray();
        sw2.Stop();
        sw1.Start();
        var arr = ma.OfType<string>().ToArray();
        sw1.Stop();
    }
}
Console.WriteLine("OfType: " + sw1.ElapsedMilliseconds.ToString());
Console.WriteLine("Cast: " + sw2.ElapsedMilliseconds.ToString());
Console.ReadLine();

Tweaks I've made:

  • Perform the "generate a list of strings" work once, at the start, and "crystallize" it.
  • Perform one of each operation before starting timing - I'm not sure if this is necessary but I think it means the JITter generates code beforehand rather than while we're timing?
  • Perform each operation multiple times, not just once.
  • Alternate the order in case this makes a difference.

On my machine this results in ~350ms for Cast and ~18000ms for OfType.

I think the biggest difference is that we're no longer timing how long MatchCollection takes to find the next match. (Or, in my code, how long int.ToString() takes.) This drastically reduces the signal-to-noise ratio.

Edit: As sixlettervariables pointed out, the reason for this massive difference is that Cast will short-circuit and not bother casting individual items if it can cast the whole IEnumerable. When I switched from using Regex.Matches to an array to avoid measuring the regex processing time, I also switched to using something castable to IEnumerable<string> and thus activated this short-circuiting. When I altered my benchmark to disable this short-circuiting, I get a slight advantage to Cast rather than a massive one.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • @Alex Done. I don't think there are any gaping holes in my benchmark, but feel free to point them out if there are. – Rawling Jul 11 '12 at 13:07
  • I start to believe the odd result I've seen is caused by the regex... I'm fiddling with some more and my results are now consistent with yours – Alex Jul 11 '12 at 13:14
  • @Rawling: Cast is faster in this case because it simply returns the enumeration unmolested because it implements `IEnumerable`. Cast special cases. – user7116 Jul 11 '12 at 15:13
  • @six ...buh. Of course. Now I remember why we were using `Regex.Matches` in the first place... I'll see if I can come up with a benchmark that will give me a useful result while still cutting out the work the regex is doing. – Rawling Jul 11 '12 at 15:20
  • @Rawling: I've done so in my answer. There is no appreciable difference after random trials. – user7116 Jul 11 '12 at 15:22
  • @six I've updated my answer to acknowledge and credit you for pointing this out. I still get a slight advantage to `Cast` but of course nothing like I was seeing before. – Rawling Jul 11 '12 at 15:57
  • @Rawling - maybe `.ToArray()` is dominating the time measurement, as that requires allocating and filling a large array. Actually, since there is no way for it to predict the array size needed [it only knows that it has an IEnumerable, so no size available], it probably reallocates and copies the array numerous times as it grows. Using the `IEnumerable` results directly might show a greater difference. – ToolmakerSteve Feb 19 '18 at 20:28
11

OfType() should be slower since doing safe type is check before an actual explicit cast operation, in the same time Cast() doing only explicit cast.

Theoretically OfType woudl be faster in case of many elements with "wrong type", so loop enumerates further just after is check, in case of Cast() on the same collection you would end's up with an InvalidCastException on each element of "wrong type" so this would be relatively slower.

Source code extracted using ILSpy:

// System.Linq.Enumerable
private static IEnumerable<TResult> OfType<TResult>(IEnumerable source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    foreach (object current in source)
    {
        // **Type check**
        if (current is TResult)
        {
            // **Explicit cast**
            yield return (TResult)current;
        }
    }
    yield break;
}

// System.Linq.Enumerable
public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source)
{
    IEnumerable<TResult> enumerable = source as IEnumerable<TResult>;
    if (enumerable != null)
    {
        return enumerable;
    }
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    foreach (object current in source)
    {
        // **Explicit cast only**
        yield return (TResult)current;
    }
    yield break;
}
sll
  • 61,540
  • 22
  • 104
  • 156
7

Just reverse the order of OfType and Cast in your method and you'll note that there is no difference. The first one always runs faster than the second one. This is a case of a bad microbenchmark.

Wrapping your code in a loop to run them in random order:

OfType: 1224
Cast: 2815
Cast: 2961
OfType: 3010
OfType: 3027
Cast: 2987
...

And then again:

Cast: 1207
OfType: 2781
Cast: 2930
OfType: 2964
OfType: 2964
OfType: 2987
...

Lifting out the Regex.Matches, which appears to cause the problem:

Cast: 1247
OfType: 210
OfType: 170
Cast: 171
...

and

OfType: 1225
Cast: 202
OfType: 171
Cast: 192
Cast: 415

So, no. OfType is not faster than Cast. And no, Cast is not faster than OfType.

user7116
  • 63,008
  • 17
  • 141
  • 172
1

Actually isof() first checks the type and then casts it, where as cast() just does the 2nd part. So obviously isof() will be slower than direct casting

http://codenets.blogspot.in/2010/06/cast-vs-oftype.html

Vinoth
  • 2,419
  • 2
  • 19
  • 34