0

This is a follow up question to this question. You should read that first.

I have now, thanks to this answer, created a query that will return the correct entries. See:

IQueryable<Data> onePerHour = dataLastWeek
    .Where(d => 
        !dataLastWeek
        .Any(d2 =>
            d2.ArchiveTime.Date == d.ArchiveTime.Date &&
            d2.ArchiveTime.Hour == d.ArchiveTime.Hour &&
            d2.ArchiveTime < d.ArchiveTime));

Now for processing the entries and displaying them on a chart, I only need one or two properties of the model class Data. The use case is something like this:

List<Data> actualData = onePerHour.ToList();

var tempCTupels = new List<TimeTupel<float>>();
tempCTupels.AddRange(actualData.Select(d => new TimeTupel<float>(d.ArchiveTime, d.TempC)));

var co2Tupels = new List<TimeTupel<float>>();
tempCTupels.AddRange(actualData.Select(d => new TimeTupel<float>(d.ArchiveTime, d.CO2Percent)));

TimeTupel is very simple and defined like this:

public class TimeTupel<TData>
{
    public TimeTupel(DateTime time, TData yValue)
    {
        Time = time;
        YValue = yValue;
    }

    public DateTime Time { get; set; }
    public TData YValue { get; set; }
}

Question

Currently actualdata is a List<Data> which means it's fully loaded in memory.
Since I only use two properties I wouldn't need to retrieve the whole object to create the TimeTupels.

Now my question is how would I achieve a performance increase? Is it the correct approach to remove the ToList?

Things I've tried

  • Just using the IQueryable<Data> to create the TimeTupel:
    IQueryable<Data> actualData = onePerHour; yields a runtime error ("System.InvalidOperationException: 'Null TypeMapping in Sql Tree'")

  • Using AsEnumerable:
    IEnumerable<Data> actualData = onePerHour.AsEnumerable(); is slow, takes around 22 seconds for 10 days worth of data

  • Using ToList as seen in the above code (ToArray is almost equal):
    List<Data> actualData = onePerHour.ToList(); is faster, takes around 5 seconds for the same amount of data

Joelius
  • 3,839
  • 1
  • 16
  • 36

2 Answers2

2

You can use an anonymous type in a Select statement to retrieve only the needed columns of data into memory, and then convert that in-memory data into the TimeTupel<> class from there. It would look like this:

var actualData = dataLastWeek
    .Where(d => 
        !dataLastWeek
        .Any(d2 =>
            d2.ArchiveTime.Date == d.ArchiveTime.Date &&
            d2.ArchiveTime.Hour == d.ArchiveTime.Hour &&
            d2.ArchiveTime < d.ArchiveTime))
    .Select(d => new { d.ArchiveTime, d.TempC, d.CO2Percent})
    .ToList();

var tempCTupels = actualData.Select(d => new TimeTupel<float>(d.ArchiveTime, d.TempC)).ToList();

var co2Tupels = actualData.Select(d => new TimeTupel<float>(d.ArchiveTime, d.CO2Percent)).ToList();
sjager
  • 416
  • 2
  • 5
  • Sadly that only makes a difference of about 4% :/ Is there no way to remove the `ToList` (and have it be faster)? Why does it need to be in memory? – Joelius Jul 15 '19 at 14:10
  • "In-memory" refers to bringing the data out of the database server, and to the web server. The web server is responsible for transforming that data into a form that the client can understand. Limiting the data retrieved is good practice and helpful, but I would guess that the real problem lies in how the data in the database server is indexed and retrieved. You'll need to optimize your query on the database level. – sjager Jul 15 '19 at 14:14
  • Web-server? What does this have to do with a webserver? This should be applicable without a webserver. And okay I will just do it like this then even though it doesn't seem to change the performance by much. Thanks :) – Joelius Jul 15 '19 at 14:17
  • Apologies for assuming that you were working with a web-based project - the same principles will apply regardless of where the code is executing. – sjager Jul 15 '19 at 14:24
0

you can select only the properties you need BEFORE you're actually loading the objects from your IQueryable. use Select after your Where statement to only load what you need.

An example:

assume you have a class which looks like this:

public class Person {
        public string Name { get; set; }
        public int Age { get; set; }
    }

I can initialize a list of items for testing:

var people = new List<Person> { new Person { Name = "John", Age = 10 }, new Person { Name = "Archie", Age = 40 } };

then we apply a filter:

var filterred = people.Where(p => p.Age > 15).Select(p => p.Name).ToList();

If I want to create a new object with the selection, to select more then just one property, I could do something like this:

var objFilterred = people.Where(p => p.Age > 15).Select(p => new { FullName = p.Name  }).ToList();

You don't have to use an anonymous object, you can also create a new class holding only the properties you require and simply populate that.

You can't "remove" ToList, as this is what actually executes your query. IQueryable is not data, it's a query that hasn't run yet and you can chain as many things as you want to it. The final step is to execute it, running something like ToList, to actually load the objects. As long as you build your IQueryable and when you're done you execute it, then you should see an improvement in the execution speed

Andrei Dragotoniu
  • 6,155
  • 3
  • 18
  • 32
  • But the same goes for `IEnumerable` right? That's just a description on how to get the entries but there's no actual data in memory. And since `List.AddRange` takes an `IEnumerable` (which is btw implemented by `IQueryable` as well) I don't think it's necessary to load the data into memory before adding it to a list right? The `AddRange` should execute the query and retrieve the data it's supposed to add, so why can't I remove the `ToList`? – Joelius Jul 15 '19 at 14:00
  • I used a list to simply make it easier to understand. so simply add a select statement after your where, then do ToList at the end, it should help with the performance issues – Andrei Dragotoniu Jul 15 '19 at 14:01