I have a SQL Server table that I would like to find "similar" rows. My first thought was to use GroupBy()
and came up with the below:
var detections = await context.Detections
.Where(d => d.TimeStamp > DateTime.Now.AddDays(-1)) //Could be much more logic here
.GroupBy(det => new DetectionGroupByModel
{
DetectedText = det.DetectedText,
HoursFromEpoch = EF.Functions.DateDiffHour(DateTimeOffset.UnixEpoch.Date, det.TimeStamp),
// A few other columns
})
.Select(grp => new DetectionGroupResultsDetection
{
Count = grp.Count(),
TimeStamp = grp.OrderByDescending(det => det.TimeStamp).First().TimeStamp,
Detection = grp.OrderByDescending(det => det.TimeStamp).First(),
})
.OrderByDescending(det => det.TimeStamp)
.Skip(0).Take(25)
.ToListAsync();
This does work exactly as expected, but the performance becomes very underwhelming when returning over a few thousand records. I was working in SQL Server Management Studio and came up with the below query:
SELECT *
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY DATEDIFF(hour, '1970-01-01 00:00:00', TimeStamp), DetectedText, DetectedObject, DeviceId ORDER BY TimeStamp DESC) AS [Rank]
FROM [Detections]
WHERE TimeStamp > '2020-01-01 00:00:00' --Could be much more logic here
) AS [DuplicationCount]
WHERE [Rank] = 1
ORDER BY TimeStamp DESC
OFFSET 0 ROWS FETCH NEXT 25 ROWS ONLY
This will execute faster on more than 1,000,000 records than the above on 10,000 records. Like several orders of magnitude faster. I just do not know how to reverse engineer this to LINQ.
This currently will not get me the duplicate count, like the GroupBy()
does, but this is so much faster, I could just run this again with WHERE [Rank] = 2
and do grouping on only those results. I am fine if that loads a bit after initial page load.
I do have complete end to end control of the DB and C# code (read: one man shop). So a redesign of tables is not out of the question. I have been adding and bench-marking indexes on the table and getting some gains, just not anywhere near what I expected.
I am using .NET 6.0 and latest Entity Framework 6.0.0 packages.