1

I'm working on a personal project where I'm in need of some help with a performant Linq query to a database. The DB in question could have millions of log entries, and through an API (Asp) I want to have an option to only return a representative subset of those logs to a graphical interface.

Here's the method in question:

public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd)
    {
        return _context.Logs.Where
            (a => a.ParameterID == ParameterID && 
            (DateTime.Compare(a.LogDate,timeStart) > 0 && DateTime.Compare(a.LogDate,timeEnd) < 0)).ToList();
    }

Note that the method takes in two DateTimes as parameters, which yield a range of time where the logs should be queried.

I would want to augment this method like so:

public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd, int limit)

For example, the DB might contain 2 million entries given the parameters passed, and the "limit" of the consumer of the API might be 40 000 entries. Thus:

numberOfEntries/limit = n

2*106 / 4*104 = 50

In this example I would want to return every 50th element to the consumer of the API, with an evenly spaced time interval between the elements.

An easy way would just be to query the entire table given the parameters and then filter out afterwards, but that seems messy and a bit antithetical to this approach, possibly very ineffective as well.

So here is my question: Is there any way to write a query such that it only queries the DB for every Nth row?

Thanks in advance!

Lacynt
  • 35
  • 5
  • 1
    What database are you using ? SQL server? – Nick Jul 31 '19 at 08:30
  • 1
    Is there a reason that you're restricted to using LINQ (EF?)? – Oliver Jul 31 '19 at 08:30
  • @Nick Yes, I'm using SQL server – Lacynt Jul 31 '19 at 08:42
  • 1
    @Oliver Not particularily, other than the other code I've written is using Linq and I'd like to learn the syntax. I have access to the DB and could write a SP as well – Lacynt Jul 31 '19 at 08:43
  • 2
    Linq has `Skip(i)` and `Take(i)`. With that you can get paging. – VDWWD Jul 31 '19 at 08:55
  • 1
    [Here](https://stackoverflow.com/questions/682615/how-can-i-get-every-nth-item-from-a-listt) has an answer to this which you can also use on a `Queryable`. However, you have to test it because it might get executed client-side. As for your actual requirement, I don't think you should do what you are trying to do. You are trying to down-sample arbitrarily and it'll cause many other problems. The proper thing to do here is to paginate your results. In terms of performant, that'll depend mostly on the nature of data and indexes, etc when you are dealing with this many rows, not LINQ. – kovac Jul 31 '19 at 09:09

2 Answers2

1

An approach you could take is using modulus on some kind of index. If you already have an auto generated Id, that could be used - but it's not ideal as you can't rely on it being continuous.

You could use RANK() to create an index column within a view, but unfortunately you can't use RANK() directly from EF code.

Something like the following:

var interval = 5; 
return _context.Logs
    .Where(a => 
        a.ParameterID == ParameterID &&
        (
           DateTime.Compare(a.LogDate,timeStart) > 0 && 
           DateTime.Compare(a.LogDate,timeEnd) < 0) &&
        a.Id % interval == 0).ToList(); //Filter on modulus of an index

In this instance however I personally would write the query in SQL.

Oliver
  • 8,794
  • 2
  • 40
  • 60
1

You can implement it using SQL Server window functions like row_number:

WITH x AS
(
    SELECT ROW_NUMBER() over (order by LogDate) as rn, *
    FROM MyTable
    WHERE
        ParameterID = @ParameterID AND
        LogDate > @StartDate AND
        LogDate < @EndDate
)
SELECT * from X WHERE rn % 50 = 0

In LINQ you can try to use the following clause:

var data = _context.Logs
    .Select((x, i) => new { Data = x, Number = i })
    .Where(x => x.Number % 50 == 0)
    .Select(x => x.Data);

But it's necessary to check actual execution plan, I guess that it will not be optimal.

Don't forget to create an index on LogDate.

Honestly I'm not sure that SQL Server is a good choice to store logs, I would like to use something like Elastic.

Yuriy Gavrishov
  • 4,341
  • 2
  • 17
  • 29