5

I am using following query

foreach (var callDetailsForNode_ReArrange in callDetailsForNodes_ReArrange)
{
    var test = from r1 in dtRowForNode.AsEnumerable()
               join r2 in dtFileRowForNode.AsEnumerable()
               on r1.Field<int>("Lng_Upload_Id") equals r2.Field<int>("Lng_Upload_Id")
               where ((r1.Field<string>("Txt_Called_Number") == callDetailsForNode_ReArrange.caller2.ToString()) || r1.Field<string>("Txt_Calling_Number") == callDetailsForNode_ReArrange.caller2.ToString())
               select r2.Field<string>("Txt_File_Name");

    var d = test.Distinct();
}

Upto here this query run in no time. But as I added

string[] str =d.ToArray();
strFileName = string.Join(",", str);

It takes almost 4-5 seconds to run. What makes it so slow on adding .ToArray() ?

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
Rajeev Kumar
  • 4,901
  • 8
  • 48
  • 83

3 Answers3

15

Upto here this query run in no time.

Up to here, it hasn't actually done anything, except build a deferred-execution model that represents the pending query. It doesn't start iterating until you call MoveNext() on the iterator, i.e. via foreach, in your case via .ToArray().

So: it takes time because it is doing work.

Consider:

static IEnumerable<int> GetData()
{
    Console.WriteLine("a");
    yield return 0;
    Console.WriteLine("b");
    yield return 1;
    Console.WriteLine("c");
    yield return 2;
    Console.WriteLine("d");
}
static void Main()
{
    Console.WriteLine("start");
    var data = GetData();
    Console.WriteLine("got data");
    foreach (var item in data)
        Console.WriteLine(item);
    Console.WriteLine("end");
}

This outputs:

start
got data
a
0
b
1
c
2
d
end

Note how the work doesn't all happen at once - it is both deferred (a comes after got data) and spooling (we don't get a,...,d,0,...2).


Related: this is roughly how Distinct() works, from comments:

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source) {
    var seen = new HashSet<T>();
    foreach(var item in source) {
        if(seen.Add(item)) yield return item;
    }
}

...

and a new Join operation:

public static string Join(this IEnumerable<string> source, string separator) {
    using(var iter = source.GetEnumerator()) {
        if(!iter.MoveNext()) return "";
        var sb = new StringBuilder(iter.Current);
        while(iter.MoveNext())
            sb.Append(separator).Append(iter.Current);
        return sb.ToString();
    }
}

and use:

string s = d.Join(",");
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 2
    I didn't know that even the first line in the method `Console.WriteLine("a");` will be printed not before the `foreach`. So i would have thought `a` came before `got data`. Linq is still surprising sometimes. – Tim Schmelter May 10 '13 at 12:23
12

Because the query DOES NOTHING until you iterate over it, which .ToArray() does.

One thing to note is that the right-hand-side of a join (in your example, r2 in dtFileRowForNode.AsEnumerable()) will be fully enumerated AS SOON as the query begins to be iterated, even if only the first element of the result is being accessed - but not until then.

So if you did:

d.First()

the r2 in dtFileRowForNode.AsEnumerable() sequence would be fully iterated (and buffered in memory), but only the first element of r1 in dtRowForNode.AsEnumerable() would be evaluated.

For this reason, if one of your sequences in the join is much larger than the other, it's more efficient (memory-wise) to put the big sequence on the left of the join. The entire sequence on the right of the join will be buffered in memory.

(I should point out that only applies to Linq-to-objects. Linq-to-SQL will run those queries in the database, so it handles the buffering.)

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
2

You need to read on deffered evaluation of linq statements. Query is not completed unless you explicitely call for results - like iterating in foreach, calling ToArray, ToList, Sum, First or one of other methods that evaluate query.

So it is your query that takes so much time to complete, not ToArray call.

Jarek
  • 3,359
  • 1
  • 27
  • 33