1

I have several Lists that I need to iterate through in order to perform a calculation. In summary, List1 is List of roadway start and endpoints (ids) and List2 is a List of individual speed samples for those endpoints (there are multiple speed samples for each set of endpoints). List1 is defined like this:

class RoadwaySegment
{ 
   public int StartId {get; set;} 
   public int EndId {get; set;}
}

List2 is defined like this:

class IndividualSpeeds
{ 
   public int StartHour {get; set;} 
   public int StartMin {get; set;} //either 0,15,30,or 45
   public int Speed {get; set;} 
   public int StartId {get; set;} 
   public int EndId {get; set;}
}

List3 is the result of my calculation and will contain the average speeds for the roadway segments in List1 for each 15 minute period of the day. List3 looks like this:

class SummaryData
{ 
  public string SummaryHour {get; set;} 
  public string SummaryMin {get; set;} 
  public int StartId {get; set;} 
  public int EndId {get; set;} 
  public int AvgSpeed {get; set;}
}

Currently, to generate List3, I iterate over List1, then over each 24 hour period of the day, then over each 15 minute interval of an hour. For each of these iterations, I check to see if the individual speed sample in List2 should be included in the average speed calculation for my roadway segment. So, it looks something like this:

var summaryList = new List<SummaryData>();
foreach (var segment in RoadwaySegments)
{
   for(int startHour = 0; startHour < 24; startHour++)
   {
      for(int startMin = 0; startMin < 60; startMin+= 15)
      {
         int totalSpeeds = 0;
         int numSamples = 0;
         int avgSpeed = 0;

         foreach(var speedSample in IndividualSpeeds)
         {
            if((segment.StartId == speedSample.StartId)&&(segment.EndId == speedSample.EndId)&&(speedSample.StartHour == startHour)&&(speedSample.StartMin == startMin))
            {
               if(speedSample.Speed > 0)
               {
                  totalSpeeds += speedSample.Speed;
                  numSamples += 1;
               }
            }
         }
         SummaryData summaryItem = new SummaryData {SummaryHour = startHour, SummaryMin = startMin, StartId = segment.StartId, EndId = segment.EndId, AvgSpeed = totalSpeeds/numSamples;
         summaryList.Add(summaryItem);
      }
   }
}

The issue with this code is that List1 might have a hundred roadway segments but List2 can contain a million or more speed sample records so sub-iterations of the list are very time consuming. Is there a way to use GroupBy/LINQ to improve the performance and readability of this code? Note the condition for including a speed in the average--it has to be greater than 0.

kittyhawk
  • 688
  • 2
  • 11
  • 25

3 Answers3

3

This is untested, but I think it will work:

from segment in RoadwaySegments
join sample in IndividualSpeeds on new { segment.StartId, segment.EndId } equals new { sample.StartId, sample.EndId } into segmentSamples
from startHour in Enumerable.Range(0, 24)
from startMin in new[] { 0, 15, 30, 45 }
let startMinSamples = segmentSamples
    .Where(sample => sample.Speed > 0)
    .Where(sample => sample.StartHour == startHour)
    .Where(sample => sample.StartMin == startMin)
    .Select(sample => sample.Speed)
    .ToList()
select new SummaryItem
{
    StartId = segment.StartId,
    EndId = segment.EndId,
    SummaryHour = startHour,
    SummaryMin = minute,
    AvgSpeed = startMinSamples.Count <= 2 ? 0 : startMinSamples.Average()
};

The main idea is to iterate the segment and sample lists once, producing a group of samples for each segment. Then, for each of those groups, you generate the hours and minutes and a summary item for each combination. Finally, you calculate the average speed of all non-zero samples in the hour/minute combination.

This isn't quite ideal because you still iterate the segment's samples 24 * 4 times, but it is much better than iterating the entire sample list. This should get you on the right path, and hopefully you can optimize that last bit further.

Bryan Watts
  • 44,911
  • 16
  • 83
  • 88
  • nice one, but it's not C# anymore. I find it hard to read - is it me? am I lacking LINQ experince and training? or is there a real conceptual problem in such long and complicated queries? – Amittai Shapira Sep 27 '11 at 19:17
  • 1
    @Amittai Shapira: That is legal C# - I am not sure what you mean. I find this much easier to read than the equivalent loops specified by the OP; there are less variables, mutable lists, if statements, etc. It also says what it's doing (joining) without my having to decipher a nest of logic to find out what it means. That's why I like LINQ: emphasizes the *what* and downplays the *how*. – Bryan Watts Sep 27 '11 at 19:21
  • I like this solution as I think it is elegant and easy to maintain. How would I handle a situation where my Where conditions eliminate all possible samples, therefore resulting in my Average calcuation failing because the sequence contains no elements? – kittyhawk Sep 28 '11 at 17:07
  • @kittyhawk: I modified my answer to account for an empty sample set within a particular 15-minute stretch. I made the assumption that you want an average speed of 0 when the sample set is empty. – Bryan Watts Sep 28 '11 at 17:23
  • @BryanWatts: Thanks! In addition, what if I also wanted to return a 0 value for the Average if there's less than a certain number of samples--say 2? – kittyhawk Sep 28 '11 at 20:32
  • @kittyhawk: I modified my answer to use a list of samples to check the count. You might want to consider updating your question with these extra details so my answer makes sense to the next reader :-) – Bryan Watts Sep 28 '11 at 20:46
2

If you are using .net 4, I would suggest parallelizing this with Parallel Linq. This is embarrassingly parallel over RoadwaySegments.

Secondly, instead of your nested iteration over the child lists, I would recommend iterating that list once and creating a Dictionary of List<IndividualSpeeds> with a Composite Key of StartId, EndId, StartHour, and EndHour. Doing a lookup in this Dictionary will be much faster vs re-iterating over that for each RoadwaySegment.

Community
  • 1
  • 1
Chris Shain
  • 50,833
  • 6
  • 93
  • 125
0

The following is supplemantery to Chris and Bryan answers:
Since you're saying there could be millions of speed sample records, you just don't want to iterate them more than the minimum possible - i.e. you should group (using GroupBy or Join operators) them, according to their start hour and minute. Than you could just iterate each group, and add each sample record into some dictionary of RoadwaySegments (something like Dictionary<RoadwaySegment, IEnumerable<IndividalSpeeds>>) and create your Summary items from this dictionary.

Amittai Shapira
  • 3,749
  • 1
  • 30
  • 54