1

I have 2 Lists of 2 different classes. We can call them Foo and Bar for this purpose. The List of Foos contains Bars that each Foo belongs to. The list of Bars contains all the Foos that the Bar belongs to.

I need a fast an efficient way of cycling through the two lists and adding each item to the other list.

I am currently using:

//  Add the List of Zones to the Vehicles
foreach (Foo foo in Program.data.Foos.list)
{
    foreach (Bar bar in Program.data.Bars.list)
    {
        bar.Foos.Add(foo);
        foo.Bars.Add(bar);
    }
} 

However, for my set of data I have ~5000 Foos and ~5000 Bars. This is tacking ~3 seconds to itterate through the other most foreach loop and appears to be rather inefficient.

Are there any faster ways to accomplish this? Possibly with Linq? What would you guys recommend to speed this up some more? Or have I hit a brick wall in terms of speed?

Nathan Raley
  • 455
  • 1
  • 7
  • 16

3 Answers3

1

if bar.Foos and foo.Bars are properties then make an initial assigment for them before loop

var list1=bar.Foos; 
var list2=foo.Bars;

Giving an initial capacity for bar.Foos and foo.Bars would be good too.

Mahmoud Gamal
  • 78,257
  • 17
  • 139
  • 164
L.B
  • 114,136
  • 19
  • 178
  • 224
  • My classes have already been initialized and already exist. I create all the Foos and then I create all the Bars. I then need to add every Foo to every Bar's list and vice versa. – Nathan Raley Oct 06 '11 at 14:24
  • So what, just change your code as "bar=new List(5000)". This would reduce the internal reallocations. – L.B Oct 06 '11 at 14:27
  • I see what you mean, and that would work for filling out my stress data as each Foo will belong to every Bar and vice versa. But in the real world use they won't always all belong to one another. Also, I have no way of knowing how many each could contain so i don't want to hard cap a size, albeit it even being more than may possibly be used. So how would this help? – Nathan Raley Oct 06 '11 at 14:40
  • @Nathan Raley, It is just an "initial capacity", you can add more/less items than initial capacity. "list.Count" gives always correct item count. I tried it with 1M strings and saw an 30% speedup. – L.B Oct 06 '11 at 14:48
1

You could also use the Union extension on List as it will avoid duplicates as long as your objects are comparable (or you have overriden GetHashCode)

bar.Foos = bar.Foos.Union(Program.data.Foos.list).ToList()
Candide
  • 30,469
  • 8
  • 53
  • 60
  • You know where I can find an implementation of Union? My properties in the Foo and Bar are ILists and I don't have an implementation for Union yet. – Nathan Raley Oct 06 '11 at 14:44
  • Just add "using System.Linq" (I tried this in .NET 4) and it will load the extension. – Candide Oct 06 '11 at 14:49
  • Nevermind, was calling it on the wrong list. I did a Foo.bars.Union(Program.data.bars.list) and this didn't result in anything being posted to the list. I can't do an assignment by setting Foo.bars = to the result of the union I get an error b/c of my custom collection type which implements IList and List – Nathan Raley Oct 06 '11 at 14:51
  • The union is an IEnumerable so you may need to add: bar.Foos = bar.Foos.Union(Program.data.Foos.list).ToList() – Candide Oct 06 '11 at 14:58
  • Yea, the problem is with it returning a generic list. It won't let me convert that list to my collection type. If I add a property to my collection type to get/set the private _list in my collection it appears to work. I'll have to see if that has any negative impact on my NHibernate mapping however. – Nathan Raley Oct 06 '11 at 15:07
1

as explained in this question you could have a look at the asymptotic complexity of the operations you're doing and see if you can redesign and somehow use a type of collection that offers less asymptotic complexity for those operations, or you could try if you can somehow divide the work among some threads or processes using the task parallel library as mentioned in this question.

Community
  • 1
  • 1
mtijn
  • 3,610
  • 2
  • 33
  • 55
  • Any recommendations on how I could possibly divy that operation up? About the only thing I could think of is splitting up the lists and performing a parallel task for % of them. – Nathan Raley Oct 06 '11 at 14:35
  • there's the tricky part, you have 2 lists with a m:n relationship so there's no way to divide those 2 lists in 'meaningful small buckets' without intimate knowledge of the contents, you could divide both lists in Z sections and cross-calculate for all the sections but since you need to share each section with each other section the amount of work will increase by 2*Z^2 which stacks up pretty quick. still, you could get a lot done in parallel but remember the lists will be immutable so you need to save to seperate result lists that you copy when all tasks have completed. – mtijn Oct 06 '11 at 14:51