0

I am trying to improve the performances of a game by parallelizing some code. One of the bottlenecks I have regards this sorting function:

_list.Sort((obj1, obj2) => obj1.Distance.CompareTo(obj2.Distance));

The variable _list is a List<Object>.

There is a way to parallelize this?

More information: Distance is a float, computed as Vector3.Distance(). It is backed in a private field. It's not computed every time it is read. I need to sort all the list, not only a part of it. The list contains at maximum 1000/1500 objects. In the worst case, from the tests I did, takes between 7 and 10 ms (from what I can see looking in the profiler using the deep profile). The list in the worst case is completely unsorted, is not something I can control.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
alirek
  • 177
  • 3
  • 14
  • 2
    `_list.AsParallel().Sort(…)`? – knittl Feb 27 '23 at 08:09
  • @knittl it does not work: it cannot find the `Sort()` function if I apply `AsParallel()` to the `List` – alirek Feb 27 '23 at 09:32
  • 3
    does not have to do with the list itself, but to compare distances is much better to compare square distances than distances. Square roots are not cheap and the order will be the same. This will improve not how you sort the list but how you achieve it, potentially for the same output (find closest item or whatever) – rustyBucketBay Feb 27 '23 at 09:43
  • Are you sure that you need to sort all the list, and not only [a part of it](https://stackoverflow.com/questions/15089373/extract-the-k-maximum-elements-of-a-list "Extract the k maximum elements of a list")? – Theodor Zoulias Feb 27 '23 at 11:25
  • What is the type of the `Distance` property? – Theodor Zoulias Feb 27 '23 at 11:27
  • 1
    `Distance` is a float, computed as `Vector3.Distance()` Regarding the sorting, yes, I need to sort all the list – alirek Feb 27 '23 at 11:31
  • How many objects does the `_list` contain at maximum? How much time does it take to sort it using the `List.Sort` method (worst case)? Is the list completely unsorted, or it's already close to a sorted state? – Theodor Zoulias Feb 27 '23 at 15:31
  • Can you use `sqrMagnitude` instead of calculating distance? That takes the square root out of the equation. – D Stanley Feb 27 '23 at 15:41
  • @TheodorZoulias the list contains at maximum 1000/1500 objects. In the worst case, from the tests I did, takes between 7 and 10 ms (from what I can see looking in the profiler using the deep profile). The list in the worst case is completely unsorted, is not something I can control – alirek Feb 28 '23 at 06:19
  • @DStanley the computation of the distance does not affect at all the sorting, it is done in another point of the code. Anyway, I already tried using `sqrMagnitude` instead of the `Vector3.Distance()` but taking the position of the two objects and computing the `sqrMagnitude` of the vector that connects them is more or less the same in terms of performances – alirek Feb 28 '23 at 06:22
  • The `Distance` is a simple property or a computed property? In other words is it backed by a private field, or it invokes the `Vector3.Distance()` every time is read? – Theodor Zoulias Feb 28 '23 at 08:45
  • 1
    `Distance` is backed in a private field, is not computed every time it is read – alirek Feb 28 '23 at 08:57
  • How frequently do you sort the `_list`? Do you sort it just once on each frame of the game, or more than once per frame? Is it an option to use in the current frame the `_list` as it was in the previous frame, so that you can parallelize the task of sorting the list with the task of computing the rest of the frame? – Theodor Zoulias Feb 28 '23 at 10:07
  • @alirek if you only need it for comparing than the [`sqrMagnitude`](https://github.com/Unity-Technologies/UnityCsReference/blob/70abf502c521c169ee8a302aa48c5600fc7c39fc/Runtime/Export/Math/Vector3.cs#L397) is slightly better as it only does `return x * x + y * y + z * z;` while the [`magnitude`](https://github.com/Unity-Technologies/UnityCsReference/blob/70abf502c521c169ee8a302aa48c5600fc7c39fc/Runtime/Export/Math/Vector3.cs#L386) (== `Distance`) additionally has to take the square root as in `return (float) Math.Sqrt(x * x + y * y + z * z);` ;) – derHugo Feb 28 '23 at 10:58
  • Btw I wrote a [simple benchmark](https://dotnetfiddle.net/ASFxnK) using a `Stopwatch`. A list of 1,500 objects is sorted in less than half a millisecond. So your performance problem might be somewhere else. – Theodor Zoulias Feb 28 '23 at 11:01
  • @TheodorZoulias I sort the `_list` once per frame, as I need it sorted in every frame. I am using the UnityProfiler and up to now the only part that is (slightly) decreasing the performances of my application (that still is strongly performant) is the sorting; moreover, I have a machine with a CPU not so fast but with a lot of core, this is the reason why I wanted to try parallelize the sorting. – alirek Feb 28 '23 at 11:42
  • @derHugo I know it and I already have the squared distance instead of the pure distance in the field to compare – alirek Feb 28 '23 at 11:42
  • Are you running your program in release mode, or in debug mode with the debugger attached? – Theodor Zoulias Feb 28 '23 at 12:52
  • In release mode – alirek Feb 28 '23 at 12:53
  • Could you try running the [simple benchmark](https://dotnetfiddle.net/ASFxnK) that I posted earlier on your machine, and report your observations? I highly doubt that your machine is so slow that it takes 7 msec to sort 1,500 `float` values, when the same work takes less than 0,5 msec on my 8 year-old AMD Athlon machine. – Theodor Zoulias Feb 28 '23 at 20:08
  • I tried the benchmark you wrote inside my project and to sort a list of 4000 objects it takes arount 1.3/1.4 ms – alirek Mar 01 '23 at 09:51
  • And how is this compatible with your [comment](https://stackoverflow.com/questions/75578029/parallelize-the-sort-function-for-listt-on-unity#comment133357955_75578029): *"the list contains at maximum 1000/1500 objects. In the worst case, from the tests I did, takes between 7 and 10 ms (from what I can see looking in the profiler using the deep profile)."* The numbers aren't adding up. – Theodor Zoulias Mar 01 '23 at 13:03
  • Are you force to use `List`?! And how about on-demand sorting? – Hekmat Sohrabi Mar 02 '23 at 10:37

1 Answers1

1

approach with AsParallel()

_list = _list.AsParallel().OrderBy(x => x.Distance).ToList();

this uses internaly quick sort which doesn't scale that good at parallelizing.

Sample by Hugo: https://dotnetfiddle.net/qbmZ1y

fubo
  • 44,811
  • 17
  • 103
  • 137
  • Using `OrderBy()` is ok, but the `ToList()` function is too costly in my specific case. This is the reason why I am using the in-place `Sort()` function – alirek Feb 27 '23 at 09:35
  • 1
    I've benchmarked it with 1m records on my local machine. `Sort()` took 3 seconds, `OrderBy().ToList()` 4 Seconds and `.AsParallel().OrderBy().ToList()` took 1.5 seconds – fubo Feb 27 '23 at 09:39
  • 1
    Maybe it's too costly in terms of memory pressure. – Matthew Watson Feb 27 '23 at 09:41
  • I have tried on my machine, and while the `Sort()` function tooks among 7 and 10 ms, the `AsParallel().OrderBy().ToList()` took among 14 and 17 ms. This is due for sure to the amount of time needed to allocate and free the memory for the supplementary `List` – alirek Feb 27 '23 at 09:50
  • Hmm seems that going through Linq is way slower [.net fiddle](https://dotnetfiddle.net/3VZ4kG) .. also `AsParallel` makes it even slower than just directly using `OrderBy` ... – derHugo Feb 27 '23 at 11:08
  • @derHugo I've added your fiddle to my answer – fubo Feb 28 '23 at 10:35
  • What I meant with the fiddle is .. this is slower than what OP already has .... So why should we use `AsParallel` at all in this situation? – derHugo Feb 28 '23 at 10:49
  • @derHugo if you warmup the benchmarked code, the LINQ and PLINQ versions are performing roughly the same ([demo](https://dotnetfiddle.net/ysM7Yi)). Without warming, the difference in the duration just illustrates that the PLINQ version has more IL code to JIT. – Theodor Zoulias Feb 28 '23 at 11:10
  • @TheodorZoulias Hm .. what good is that if you don't know the list contents beforehand? Or maybe I don't really understand what that warmup means/does exactly .. I suppose it doesn't work the same if the list is modified in the meantime right? And either way the `Sort` seems still to be faster anyway so why go through Linq/Plinq at all? – derHugo Feb 28 '23 at 11:15
  • 2
    @derHugo [Warm-up when calling methods in C#](https://stackoverflow.com/questions/4446203/warm-up-when-calling-methods-in-c-sharp) *"If a method is never called it won't have been compiled yet, so the very first time you run it it might be slower."* – Theodor Zoulias Feb 28 '23 at 11:21
  • Ha thanks @TheodorZoulias! So question is, if I run this query on another list/version of this list before, once, than later on it is warmed up even though the list / list content was changed? .. tested seems so ^^ but still `Sort` is slightly better ^^ – derHugo Feb 28 '23 at 11:27
  • @derHugo yep. AFAIK the compilation cost is payed only once, irrespective of the data that is working with. Maybe if you leave your program idle for an hour, some JITted code might be recycled, I don't know. – Theodor Zoulias Feb 28 '23 at 11:32
  • 1
    @TheodorZoulias Anyway learned a huge thing for having in mind when benchmarking thanks a lot! ;) – derHugo Feb 28 '23 at 11:34
  • Fubo [the sample](https://dotnetfiddle.net/qbmZ1y) warms up the `List.Sort`, but the LINQ and PLINQ code is still cold during the time measurement. – Theodor Zoulias Feb 28 '23 at 12:33
  • @TheodorZoulias feel free to edit my answer - paralleizaion seems to be useless anyway and that's just a technical approach – fubo Feb 28 '23 at 12:34
  • Fubo home-made benchmarks are usually viewed with suspicion. If you want to add value to your answer, I would suggest posting the results of a [BenchmarkDotNet](https://github.com/dotnet/BenchmarkDotNet)-based benchmark. These are trusted by everyone. Don't expect me to do it on your behalf though. I am an egoist, so given the option I prefer to add value to my own posts. :-) – Theodor Zoulias Feb 28 '23 at 20:25
  • @TheodorZoulias then please post your own answer – fubo Mar 01 '23 at 15:23
  • Fubo I might, when the question has been clarified, and I have a suggestion to offer. Currently the performance metrics that have been reported by the OP are mutually contradicting. In one occasion it's 7-10 msec for 1,500 objects, in another occasion it's 1.3-1.4 msec for 4,000 objects. I'll wait for the dust to settle down before making any suggestion. – Theodor Zoulias Mar 01 '23 at 15:32