-7

I am looking for the best way to create a "best" list for a group of lists. Here's the shorthand of what I have:

    public class Tank
    {
     public int id;
     public string name;
     List<Volume> volume; //Volume for each day for two years         
    }

    public class Volume
    { 
    public DateTime date;        
    public double volume; 
    public string status; 
    } 

    public class Master
    {
     public DateTime date;
     public string bestTankName;
     public int id;
     public double bestVolume;
    }        

Main
{
 List<Master> masterVolume = new List<Volume>();
 List<Tank> Tanks = SQLQueryToGetVolumes();

 for (int i = 720; i < 0; i--)
 {
    masterVolume.Add(new Master
    {
       date = DateTime.Today.AddDays(-i)

    });
 }

 //for each date in the master volume list, I need to get the 
 //highest volume from the other lists for that day, 
 //along with the other information associated with that day
 foreach (Master m in masterVolume)
 {
  //Compare all the Tanks and get the best one for that day*******
 }
}

I have made a mess of foreach loops that result in roughly 100 Billion computations for all of the data that needs to be processed. There are thousands of these groups of tanks that need to processed in this way.

Is there a better way to go about doing this? I need to get the highest volume, for every day. I then need to record that volume along with the other tank class properties into a master list without it taking 15 hours.

Any help or direction would be much appreciated.

Anon
  • 1
  • 2
  • Presumably you already have list of "volume" by day... so the only missing piece should be covered by duplicate - max by property... Since you decided not to show your current implementation it's pretty much all can be suggested. – Alexei Levenkov Jan 20 '20 at 04:38
  • Bummer @AlexeiLevenkov I am sorry that you closed this. I was in the middle of an answer. The OP's data structure is well defined in the classes. And while the OP tagged Linq, they didn't ask for a Linq answer. Based on the data structure, I do not think that Linq will provide the correct solution to this problem. – jwatts1980 Jan 20 '20 at 04:56
  • @jwatts1980 feel free to edit question so it is clear that LINQ solution would not work and I'll re-open... I'm not good with SQL to suggest solution that would actually be useful, but in-memory LINQ will give O(total_number_of_items) complexity (split by days and max for each day) - I doubt you can dramatically improve that... – Alexei Levenkov Jan 20 '20 at 05:02
  • @AlexeiLevenkov you mention SQL, I assume that you are saying that this answer would be better solved by changing how the data is retrieved from the database, rather than focusing on aggregating it as the OP asked? With the data structures as listed, Linq would be inefficient as it would require flattening the data before aggregating it, requiring at least O*2 and twice the memory. Just using a dictionary and a nested for loop will yield O. – jwatts1980 Jan 20 '20 at 05:21
  • @jwatts1980 I'm all ears how you manege to do flattenting in O(n^2)... I can't see how something like `tanks.SelectMany(x => x.Volumes.Select(d => new {x.Id, d.Date, d.Volume}).GroupBy(e => e.Date))` would be n^2 as it is "Just using a dictionary", and "nested for loop" will come from MaxBy in the duplicate I tried to suggest... But indeed writing the same code by hand is entertaining. – Alexei Levenkov Jan 20 '20 at 05:28
  • @AlexeiLevenkov sorry, I did not mean O(n^2), I meant O(n*2), as in double, as in traversing the entire list at least 2 times (or more). If `SelectMany` plus a `GroupBy` plus some kind of max function are efficient enough to manage O(n), then that would be just as good. Otherwise, the nested `for` will provide O(n). – jwatts1980 Jan 20 '20 at 05:43
  • @jwatts1980 I see... some sort of homegrown O-notation (the commonly used one considers O(n) and O(2n) and O(n +2) equivalent) . Anyway - you wanted to post an answer it's good time to do so. – Alexei Levenkov Jan 20 '20 at 05:50
  • This question keeps looking unclear to me – Marco Salerno Jan 20 '20 at 08:21

1 Answers1

0

An optimal solution for this would have you traverse the entire list of volumes only once, or O(n). The loops you mentioned were doing some multiple of that. There may be a way to do this with Linq to attain O(n) efficiency, but without more testing I do not know.

However, you can get O(n) with a nested for loop. For my suggestion, you could use a dictionary where the key is the date (in format MM/DD/YYYY) and the value is the Master.

Dictionary<DateTime, Master> masterVolumes = new Dictionary<DateTime, Master>();
List<Tank> tanks = SQLQueryToGetVolumes();
DateTime dt;
Master m;
foreach(Tank tank in tanks)
{
    foreach(Volume volume in tank.volume)
    {
        //Since dates may have times associated, we strip the time.
        dt = new DateTime(volume.date.Year, volume.date.Month, volume.date.Day);

        //See if the day already exists
        if (masterVolumes.ContainsKey(dt))
        {
            //If so, compare the current best volume
            m = masterVolumes[dt];
            if (m.bestVolume < volume.volume)
            {
                //There is a new master!
                m.bestTankName = tank.name;
                m.id = tank.id;
                m.bestVolume = volume.volume;
            }
        }
        else
        {
            //This date doesn't exist yet, so add it
            m = new Master()
            {
                date = dt,
                bestTankName = tank.name,
                id = tank.id,
                bestVolume = volume.volume
            };
            masterVolumes.Add(dt, m);
        }
    }
}

//At the end of this, the masterVolumes.Values will hold a collection of 
// the highest volume Master for each day. 
jwatts1980
  • 7,254
  • 2
  • 28
  • 44
  • How does the nested for loop translate to `O(N)`? Seems like its `O(T * V)`, where `T` is the number of tanks and `V` is the number of volumes per tank. – RoadRunner Jan 20 '20 at 06:55
  • This significantly improved my performance. I haven't used Dictionaries before so I didn't think to use them. I was iterating through my entire master list for each tank. Eliminating that decreased the complexity substantially. Thanks a ton. – Anon Jan 21 '20 at 03:59
  • @Anon No problem. Going forward try not to ask "best way" questions, as they are opinion based and will get downvoted and closed. Most opinion questions you see on SO are old. You could have posted your working, inefficient solution and stated the problem that it was taking 15+ minutes to run and you needed to get the time down. Likely you would have received better answers, or someone would have marked it as a duplicate to a similar question. SO has millions of questions, sometimes posting a good question is a low key way to get the community help you find a duplicate with a great answer. – jwatts1980 Jan 21 '20 at 04:47