1

I have a table of users, grouped into sessions. I would like to select an array for each user that consists of the number of tasks they have in each session:

var taskCounts =
    from session in gzClasses.e_userLongSessions
    orderby session.workerId ascending, session.startTime ascending
    group session by session.workerId
    into record
    select record.Select(s => s.totalTasks).ToArray();
int[][] result = taskCounts.ToArray();

The above theoretically works, but it results in a separate SQL query for each user, as shown in the image below. Since the database is not local, this takes quite a long time. Is there a way to grab all the data in one query and reduce the overhead of running a bunch of individual queries?

enter image description here

At the same time, I'd like ensure that it's efficient by only transmitting the totalTasks integer values over the wire, instead of sending the entire database record.

Put another way, I'd like to grab a set of grouped integers from a remote database all in one query, and have them arranged into arrays in a C# program. It sounds pretty simple in principle, but I'm having a hard time getting LINQ to do this.

Andrew Mao
  • 35,740
  • 23
  • 143
  • 224
  • Do you have Lazy Loading enabled in your ef context? This could account for such behaviour since for each object that would be enumerated it would perform a different query. – Dimitris Kalaitzis May 16 '14 at 17:55
  • @DimitrisKalaitzis I don't know what that is - could you give a pointer that shows how to turn it off, and I'll try this query again? – Andrew Mao May 16 '14 at 17:58
  • Do you use entity framework? If you do you could change the setting where you instanciate gzClasses. e.g. gzClasses.ContextOptions.LazyLoadingEnabled = false; – Dimitris Kalaitzis May 16 '14 at 18:02

4 Answers4

1

Depending on how many records you get back, you could return a minimal amount of data and do the grouping part in memory (pretty quickly since it'll already be sorted):

Using method syntax:

gzClasses.e_userLongSessions
         .OrderBy(s => s.workerId)
         .ThenBy(s => s.startTime)
         .Select(s => new { s.workerId, s.totalTasks })
         .ToList()
         .GroupBy(x => x.workerId)
         .Select(g => g.Select(x => x.totalTasks).ToArray())
         .ToArray();

Using query syntax:

var query = from session in gzClasses.e_userLongSessions
            orderby session.workerId ascending, session.startTime ascending
            select new { Id = s.workerId, s.totalTasks };

var taskCounts = from worker in query.ToList()
                 group worker by worker.Id into g
                 select g.Select(x => x.totalTasks).ToArray();

var result = taskCounts.ToArray();

I see the same behavior in Linqpad (Linq-to-SQL default), but I feel as though I've seen Linq-to-Entities handle a GroupBy followed by a group.Select(x => x.Something) without resulting in an n+1 query...Could be imagining things though (not sure what the SQL would look like to achieve that).

Ocelot20
  • 10,510
  • 11
  • 55
  • 96
  • Yes, doing it in memory is probably a fine workaround for now. It is LINQ-to-SQL, I believe (although I'm not as familiar with Windows terminology). I also don't know what the SQL would look like to do that, so this is probably the right approach unless someone comes up with some magic that I didn't know about. – Andrew Mao May 16 '14 at 18:03
  • Excellent. This query was nearly instant, although I had to edit your answer. Can you see if there might be a more concise way to write the syntax above? Your answer is not exactly correct because you need to grab `g.Select(e => e.totalTasks).toArray()`. – Andrew Mao May 16 '14 at 18:08
  • As I mentioned in my previous edit, you will need to grab the `totalTasks` field out of the grouping at some point before turning it into an array, or it won't be an `int[][]`. – Andrew Mao May 16 '14 at 18:16
  • Oops, added that fix back in. Wanted to keep the method syntax for posterity. You can kind of mix-and-match the method/query syntax more than I did here if you find it more readable. Essentially: `var result = (taskCountsQuery).ToArray()` – Ocelot20 May 16 '14 at 18:18
0

Wouldn't a Dictionary be more useful than an array of arrays?

gzClasses.e_userLongSessions.OrderBy(s => s.workerId)
    .ThenBy(s => s.startTime)
    .GroupBy(s => s.workerId, t => t.TotalTasks).ToArray()
    .ToDictionary(g => g.Key, h => h.ToArray());

That should return a Dictionary, with the workerId as the key, and an array of the number of tasks as the value.

user1417835
  • 1,002
  • 1
  • 8
  • 14
  • I think this is just as valid, but it's not really answering the original question. Or, at least, it's implicitly doing the same thing as @Ocelot20's answer by putting everything in memory first. – Andrew Mao May 16 '14 at 19:24
  • This is a matter of how LINQ to SQL is implemented, and is not something you can easily remedy. The best approach would be to force the initial statement to execute, and then group the data in-memory. It seems like the implementation in EF would better suit your needs. http://stackoverflow.com/questions/12645767/linq-to-sql-simple-group-by-generating-many-sql-queries – user1417835 May 16 '14 at 19:47
  • I don't think I fully grasp the difference between LINQ to SQL and Entity Framework. I'll have to look into that some more. – Andrew Mao May 16 '14 at 19:55
  • What Entity Framework do in this case is to throw a NotSupportedException, disappointing. – EduardoS May 19 '14 at 21:44
-1

Maybe if you replace "from session in gzClasses.e_userLongSessions" by "from session in gzClasses.e_userLongSessions.AsEnumerable()"?

EduardoS
  • 199
  • 1
  • 3
  • How is not executing *any* of the query on the database supposed to ensure that the *entire* query is executed on the database? – Servy May 16 '14 at 17:48
  • I don't understand what this is doing, as it is translating the `IQueryable` into an `IEnumerable`, and if anything it would basically pull the entire database over the wire just to answer the query. – Andrew Mao May 16 '14 at 17:49
  • Exactly Andrew, it will bring all records from the gzClasses.e_userLongSessions, but you are doing it anyway, the group ... by ... into isn't like SQL group by, SQL does not have a command to group into a list, the way EF is converting it in SQL commands is by issuing one select to each workerId, bringing all records, in several SELECTs, so instead just bring all in one SELECT and make .Net do the work. BTW did the AsEnumerable worked? – EduardoS May 16 '14 at 17:54
-2

The position of your ToArray() causes LINQ to be more eager than it should be, resulting in your many queries. I think this will result in just one query:

var taskCounts =
    from session in gzClasses.e_userLongSessions
    orderby session.workerId ascending, session.startTime ascending
    group session by session.workerId
    into record
    select record.Select(s => s.totalTasks);
int[][] result = taskCounts.Select(x => x.ToArray()).ToArray();
Tim S.
  • 55,448
  • 7
  • 96
  • 122
  • Your query is functionally identical to the OP's query. – Servy May 16 '14 at 17:37
  • That `Select` will still be on an `IQueryable` and might produce the same result he was seeing. `taskCounts.ToList().Select(x => x.ToArray()).ToArray()` would probably do the trick. – Ocelot20 May 16 '14 at 17:37
  • That actually results in the same behavior, as I believe it is functionally the same query. The `IQueryable` isn't evaluated until a `toArray` is called on it, so the method of construction doesn't matter. – Andrew Mao May 16 '14 at 17:37
  • @Ocelot20 I tried the `ToList()` that you suggested and it results in separate queries as well. – Andrew Mao May 16 '14 at 17:39
  • @AndrewMao That proves that the problem is with the query itself, not its translation to in memory arrays. What query provider is it, and what is `gzClasses.e_userLongSessions` and where does it come from? My guess is it represents an already materialized query of objects that have lazily initialized properties, which you are materializing individually. – Servy May 16 '14 at 17:40
  • @Servy It's a regular `DataContext` created by connecting to a SQL server and dragging a database table into a `.dbml` file. If this is not the right way to do it - let me know; I mainly develop on not-Windows. – Andrew Mao May 16 '14 at 17:43