3

I have a database table with over 200K+ records and a column containing a Date (NOT NULL). I am struggling to do a GroupBy Date since the database is massive the query takes soooo long to process (like 1 minute or so).

My Theory:

  • Get the list of all records from that table
  • From that list find the end date and the start date (basically the oldest date and the newest)
  • Then taking say like 20 dates to do the GroupBy on so the query will be done in a shorter set of records..

Here is my Model that I have to get the list:

registration.Select(c => new RegistrationViewModel()
{
    DateReference = c.DateReference,
    MinuteWorked = c.MinuteWorked,             
});
  • The DateReferenceis the database column that I have to work with...

I am not pretty sure how to cycle through my list getting the dates start and end without taking too long.

Any idea on how to do that?

EDIT:

var registrationList = await context.Registration 
  .Where(c => c.Status == StatusRegistration.Active) // getting all active registrations
  .ToRegistrationViewModel() // this is simply a select method
  .OrderBy(d => d.DateReference.Date) // this takes long
  .ToListAsync();

The GroupBy:

 var grpList = registrationList.GroupBy(x => x.DateReference.Date).ToList();

var tempList = new List<List<RegistrationViewModel>>();
foreach (var item in grpList)
{
   var selList = item.Select(c => new RegistrationViewModel()
   {
    RegistrationId = c.RegistrationId,
    DateReference = c.DateReference, 
    MinuteWorked = c.MinuteWorked,
   }).ToList();

   tempList.Add(selList);
}

This is my SQL table: registration table T-SQL

This is the ToRegistrationViewModel() function:

 return registration.Select(c => new RegistrationViewModel()
 {
   RegistrationId = c.RegistrationId,
   PeopleId = c.PeopleId,
   DateReference = c.DateReference,
   DateChange = c.DateChange,
   UserRef = c.UserRef,
   CommissionId = c.CommissionId,
   ActivityId = c.ActivityId,
   MinuteWorked = c.MinuteWorked,
   Activity = new ActivityViewModel()
     {
       Code = c.Activity.Code,
       Description = c.Activity.Description,
     },
     Commission = new CommissionViewModel()
     {
       Code = c.Commission.Code,
       Description = c.Commission.Description
     },
     People = new PeopleViewModel()
     {
       UserId = c.People.UserId,
       Code = c.People.Code,
       Name = c.People.Name,
       Surname = c.People.Surname,
       Active = c.People.Active
     }
});
devludo
  • 93
  • 10
  • 2
    This sound very much like a [XY-problem](https://xyproblem.info/). You should start by showing the actual slow query, as well as include information like what columns are indexed. – JonasH Sep 26 '22 at 06:58
  • 3
    If your `GroupBy` is so slow you should find out what is the reason and then fix it. Don't try to find a workaround. There's probably just 1-2 indexes missing. Btw, to make your workaround efficient you need to put indexes(asc+desc) on `MinuteWorked` anyway. – Tim Schmelter Sep 26 '22 at 06:58
  • 1
    200K rows is *tiny*, not massive. If each row is 100 bytes, all of this is just 20MB and can easily get buffered in RAM. Even 1M rows is so small it's considered the uncompressed delta for columnstore indexes - you need at least 1M rows before the database engine compresses it. If the query takes more than a second something is wrong with the design *and* the query. A simple `SELECT MIN(Date), MAX (Date) From SomeTable` should take milliseconds for such a small table. If you want to group by date, `SELECT Date,SUM(MinuteWorked) FROM X GROUP BY Date` should be equally fast – Panagiotis Kanavos Sep 26 '22 at 07:05
  • 2
    What you try to do is what probably already happens. Since the table has no indexes, the server has to scan the list of all records to construct the groups and calculate dates and sums. Trying to do the same *on the client* will only make things worse – Panagiotis Kanavos Sep 26 '22 at 07:06
  • I have updated my question with some more info, the Date column is not indexed tho. Can i even do that? Sorry i thought that a table with 200K+ record was massive... – devludo Sep 26 '22 at 07:09
  • Which one is the *actual* query? Is `Date` indexed? That `registrationList.GroupBy(x => x.DateReference.Date).ToList();` loads all rows in the client's memory and then *splits them in groups*. You can't have a `GROUP BY` in SQL that returns non-aggregate rows. – Panagiotis Kanavos Sep 26 '22 at 07:10
  • @devludo what is `ToRegistrationViewModel`? This can't be translated to SQL. Either that loads everything in memory or you're getting a runtime error and didn't notice – Panagiotis Kanavos Sep 26 '22 at 07:11
  • @PanagiotisKanavos no Date is not indexed, i didn't know you could do that on `Date` type objects. – devludo Sep 26 '22 at 07:15
  • 2
    The code you posted loads all 200K rows from the table *and all related tables* in memory, explicitly. There's no grouping or filtering involved. Why do you want *all* 200K rows? What are you trying to do? You can't display 200K rows in a grid – Panagiotis Kanavos Sep 26 '22 at 07:16
  • I am trying to display a table with nested rows. I.e. a parent row in the table is the DateReference key in the groupby and then the child row is the actual registration for that date similar to this [table](https://ng.ant.design/components/table/en#components-table-demo-expand) – devludo Sep 26 '22 at 08:09

1 Answers1

1

There are multiple potential problems here

Lack of indexes

Your query uses the Status and DateReference, and neither looks to have an index. If there are only a few active statuses a index on that column might suffice, otherwise you need a index on the date to speedup sorting. You might also consider a composite index that includes both columns. An appropriate index should solve the sorting issue.

Materializing the query

ToListAsync will trigger the execution of the sql query, making every subsequent operation run on the client. I would also be highly suspicious of ToRegistrationViewModel, I would try changing this to an anonymous type, and only convert to an actual type after the query has been materialized. Running things like sorting and grouping on the client is generally considered a bad idea, but you need to consider where the actual bottleneck is, optimizing the grouping will not help if the transfer of data takes most time.

Transferring data

Fetching a large number of rows will be slow, no matter what. The goal is usually to do as much filtering in the database as possible so you do not need to fetch so many rows. If you have to fetch a large amount of records you might use Pagination, i.e. combine OrderBy with Skip and Take to fetch smaller chunks of data. This will not save time overall, but can allow for things like progress and showing data continuously.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • Thaks for replying actually i use Pagination when i fetch the list (removed it for this example) before doing the groupby. The problem is that pagination might not get all the dates from a date i.e. it might skip some data (due to pagination size) – devludo Sep 26 '22 at 08:02
  • @devludo Yes, obviously you need to wait until you have all pages containing a specific date before outputting a group. If the records are ordered that would be a fairly simple since you can just compare the date of a record with the previous date. But pagination will not make anything *faster*, it can only improve the *user experience*. – JonasH Sep 26 '22 at 09:04