0

So here's my situation: I have two tables that contains statistics, one table for the statistic definitions, and a table for the statistic events. Each row in the stats event table has a timestamp, a value, and a Reference to the statistic definition that it applies to, and each statistic definition has a collection of stat entries. (i.e. one to many)

My app allows a user to select a statistic definition and date range, and graphs the entries from the stat event table for the selected definition. However, it's possible for the user to select a large date range, and this could result in a larger number of returned events than is necessary.

I'd like to return a subset of the data (n points) that's evenly distributed over the time range that the user selects. My current (naive) implementation is the following:

var totalEntries = session.QueryOver<StatEvent>()
                          .Where(x => x.Date > start_date && x.Date < end_date && StatDef.Id == defn.Id)
                          .List()
int modFactor = (int) Math.Ceiling((double)totalEntries.Count/30);
var temp = totalEntries.Where((x, i) => i%modFactor == 0);
return temp.OrderBy(x => x.Date).ToList();

So I'm retrieving all of the records from the DB matching the date range and stat defn, then selecting 30 entries.

The problems with this approach are:

1) It's inefficient to get all of the entries from the DB, and then discard the ones that I don't want.

2) If the data is clumped (i.e. there are many records around a certain time, and not many records around another time) the selection will be weighted to where there is the most data. This is not necessarily a bad thing, but I'd like my graphs to be prettier, and have evenly distributed data points.

So, does anyone know of a smart way to achieve the same result but in a more efficient manner?

Thanks!

Kyle Fransham
  • 1,859
  • 1
  • 19
  • 22

1 Answers1

0

I would suggest aggregating the values over a set of intervals, where the size of the interval is calculated as 1/30th of the selected date range. The manner in which you aggregate will depend on the stats, but a naive implementation could just sum them or average them or get the max value.

Martin Ernst
  • 5,629
  • 2
  • 17
  • 14
  • This is a good idea, but my only concern is that if the user selects a very large date range (like a year, which is a valid thing to ask for) then I'd have to average over all of the entries in the range, and this risks slowing things down even more. – Kyle Fransham Jun 14 '12 at 14:06
  • You can do the aggregate in the database query, using Projections.SqlGroupProjection (see http://stackoverflow.com/questions/10801712/groupby-sqlfunction-on-queryover/10850296#10850296) and some MySql date function to group by the interval – Martin Ernst Jun 15 '12 at 15:37