3

In my project I receive some data from an SPS all x seconds. Every y minutes I archive the current Data in a database so I'm able to show statistics.

The data I receive gets put in a model. Something like this but much more complex:

public class Data
{
    public DateTime ArchiveTime { get; set; }
    public float TempC { get; set; }
    public float CO2Percent { get; set; }
}

I have a repository for the database that returns all entries in a certain time span. See this code:

// Context is my DbContext for a SQLite db and Data is the DbSet<Data> on that
IQueryable<Data> GetDataBetween(DateTime from, DateTime to) => Context.Data.Where(d => (d.ArchiveTime >= from && d.ArchiveTime <= to));

As you can see this returns an IQueryable so I want to make use of the linq to entities functionality.
I believe it's called linq to entities but in case it isn't, I mean the functionality that converts expression trees to sql or whatever instead of just executing it in C#.

Since there is an indeterminable amount of entries per hour in the database I want to only get one entry per hour (the first one) so I can display it in a graph.

Here's an example of some datetimes that maybe show my intent a bit better:
NOTE: these are only the datetimes contained in the object, I want the whole object - not just the times.

// say this is all the data I get between two times
2019-07-06 10:30:01 // I want
2019-07-06 10:40:09
2019-07-06 10:50:10
2019-07-06 11:00:13 // I want
2019-07-06 11:10:20
2019-07-06 11:20:22
2019-07-06 11:30:24
2019-07-06 11:40:32
2019-07-06 11:50:33
2019-07-06 12:00:35 // I want
2019-07-06 12:10:43
2019-07-06 12:20:45
2019-07-06 12:40:54
2019-07-06 12:50:56
2019-07-06 13:00:58 // I want
2019-07-06 13:11:06
2019-07-06 13:21:08
2019-07-06 13:31:09

The current way I do this is via a IEnumerable and GroupBy. See this code:

var now = DateTime.Now;
IQueryable<Data> dataLastWeek = repos.GetDataBetween(now.AddDays(-7), now);

IEnumerable<Data> onePerHour = dataLastWeek.AsEnumerable()
    .GroupBy(d => new DateTime(d.ArchiveTime.Year, d.ArchiveTime.Month, d.ArchiveTime.Day, d.ArchiveTime.Hour, 0, 0))
    .Select(g => g.First());

This works fine but since it uses IEnumerable and creates objects, I don't get the advantages of linq to entities and I think it must a lot slower this way.

Is there any way to rewrite this query to work with IQueryable on a SQLite database?

EDIT: I'm working with the .net core 3 preview6 (newest preview) version of EF Core. Maybe there is a new feature that allowes for what I want :)

Joelius
  • 3,839
  • 1
  • 16
  • 36
  • I guess you tried it without the AsEnumerable and it didn't work which could be because the LINQ provider can't really convert `new DateTime(...)` or the `First(...)` of a grouping to SQL. However, I think `dataLastWeek.GroupBy(d => new { d.ArchiveTime.Year, d.ArchiveTime.Month, d.ArchiveTime.Day, d.ArchiveTime.Hour }).Select(g => g.Min(x => x.ArchiveTime));` should work as this can be expressed in SQL as `SELECT MIN(ArchiveTime) FROM data GROUP BY DATEPART(YEAR, ArchiveTime), DATEPART(MONTH, ArchiveTime), DATEPART(DAY, ArchiveTime), DATEPART(HOUR, ArchiveTime)`. – ckuri Jul 15 '19 at 11:15
  • Yes I did try that (without the `AsEnumerable`) and you're correct that this didn't work. Also the solution you're suggesting returns only the `DateTime`, not the `Data`-object itself :/ – Joelius Jul 15 '19 at 11:18
  • `AsEnumerable()` is not needed in EF Core. Unfortunately with the current EF Core query translation, the query will use client evaluation anyway because of the `First()` or `FirstOrDefault()`. – Ivan Stoev Jul 15 '19 at 11:19
  • @IvanStoev if I don't use `AsEnumerable()` it thinks I want to do a `where` on the `IQueryable` which throws an exception since I create a new `DateTime` and how would EF know how to do that. Is there any way to remove the first then? @juharr mentions `OrderBy` but how would you use that in this context? With `Take(1)`? – Joelius Jul 15 '19 at 11:22
  • @ckuri I edited the question to add a note that I want the full object, not just the datetimes. Your solution seems to work fine but as mentioned, only returns the datetimes. – Joelius Jul 15 '19 at 11:25
  • Assuming EF Core 2.2, that part can easily be fixed with anonymous type, e.g. `.GroupBy(d => new { d.ArchiveTime.Date, d.ArchiveTime.Hour })`. As I said, the problem is with `First()`. – Ivan Stoev Jul 15 '19 at 11:28
  • @IvanStoev I know, @ckuri already provided a working solution for that. Is there no replacement for `First()`? I was thinking you might need to use subqueries but I'm not sure how you would implement that. – Joelius Jul 15 '19 at 11:30
  • Unfortunately no. `First` definitely should not be used because it forces client eval, but unfortunately all other possible patterns like `OrderBy()` + `Take(1)` or `FirstOrDefault()` also lead to client eval (although not forced). – Ivan Stoev Jul 15 '19 at 11:41
  • @IvanStoev Hmm. There is a [`First` implementation for `IQueryable`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.queryable.first?view=netcore-3.0) but I guess this doesn't mean it won't use client eval :/ – Joelius Jul 15 '19 at 11:45
  • Indeed. And they do it intentionally - see below. Also the best workaround I can think of. – Ivan Stoev Jul 15 '19 at 12:02

1 Answers1

5

The key part of the GroupBy can easily be made translatable by avoiding new DateTime(...) and using either anonymous type

.GroupBy(d => new { d.ArchiveTime.Date, d.ArchiveTime.Hour })

or Date property + AddHours:

.GroupBy(d => d.ArchiveTime.Date.AddHours(d.ArchiveTime.Hour))

Unfortunately currently (EF Core 2.2) does not translate nested First / FirstOrDefault / Take(1) to SQL and uses client evaluation. For First() it is forced in order to emulate the LINQ to Objects throwing behavior, but for the other two patterns it's caused by the lack of proper translation.

The only server side solution I see for your concrete query is to not use GroupBy at all, but correlated self antijoin, something like this:

var onePerHour = dataLastWeek.Where(d => !dataLastWeek.Any(d2 =>
    d2.ArchiveTime.Date == d.ArchiveTime.Date &&
    d2.ArchiveTime.Hour == d.ArchiveTime.Hour &&
    d2.ArchiveTime < d.ArchiveTime));
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • This works, thanks. However you only check for year and hour - any reason for that? To get the behaviour I want (and had with the old solution) you need to also check month and day otherwise you only get 24 entries. – Joelius Jul 15 '19 at 12:11
  • Very quickly.. I don't directly use the data for the graph but create tupels from it. These are unknown to ef so it throws if I do a `select` on the `IQueryable`. When I use `AsEnumerable` it works but it's very slow. Using `ToList()` instead of `AsEnumerable` speeds it up by a factor of about 3. So is it correct to call `ToList()` on `onePerHour` before I create my tupels? – Joelius Jul 15 '19 at 12:15
  • (1) Sorry, copy/paste mistake, I meant `Date` (2) Not sure what you mean, `ToList()` loads everything in memory. – Ivan Stoev Jul 15 '19 at 12:27
  • I have asked [a follow up question](https://stackoverflow.com/questions/57040867/create-custom-objects-from-iqueryable-without-loading-everything-into-memory). I'd appreciate it if you looked into it when you have time :) – Joelius Jul 15 '19 at 13:32