1

I have a piece of code used for comparison which is in a hot path being called millions of times. After benchmarking, that piece of code is a candidate for optimization.

Basically, I have 2 integers c and m. Comparing two objects involves first checking c then if c is equal on both sides, then comparison is decided by their m value.

c can only be in the range [0-100]

m can only be in the range [0-200]

pseudo code

if this.c < other.c return -1; // Or any negative number
if this.c > other.c return 1;  // Or any positive number
// Both objects have equal 'c' value
if this.m < other.m return -1; // Or any negative number
if this.m > other.m return 1;  // Or any positive number
// Both 'c' and 'm' are equal
return 0;

The following C# code is what I currently have

int CompareTo(Obj other)
    => _c < other._c || (_c == other._c && _m < other._m)
        ? -1
        : _c == other._c && _m == other._m
            ? 0
            : 1;

I am wondering if this could be optimized any further, perhaps with bit-manipulation?

Thanks

  • 1
    Do the results have to be exactly -1, 0, and 1? Comparisons often allow returning negative, zero, and positive values, that enables a small trick when the ranges of the inputs are small enough (which they are here) – harold Mar 15 '21 at 02:21
  • @harold No, any negative number for less than or positive number for greater than is fine. If equal, return value should be zero. Updated question. – user14557971 Mar 15 '21 at 02:23
  • 3
    Given the ranges, you could technically pack both values into one `c * 256 + m` (or `(c << 8) | m` using only bitwise ops) then subtract those. – dxiv Mar 15 '21 at 02:28
  • Your "optimized" code looks the same to me. Did you time it in a release build? – Jeremy Lakeman Mar 15 '21 at 02:37
  • 1
    Some options; https://sharplab.io/#v2:C4LglgNgPgAgTARgLACgYGYAE9MHkBGAVpgN6qYXZZgB2wmA+gMYDc5lGmt9DAtmykpUudTAGEA9rwAOAQwBOAU1w1FACgLEJwABaL5ASjKChlMADNMa3WADOAOmaYAPJm175jpgewB2TAC0CAKmZpbWOnZemAB8brr6Xj4w/sHsoRZWNg58LvEejrzJ/kEhoVzh2YWx+Yl8xZhpJqEpmAAMZRQAvunC3OJSckoAKgDuEhpEtYa9QgC8cU6u7nVMmFBQVk5zc9PRAGT7jLx5K571s+UA/IHIzeUUIIxrO3tOh8eYr2eFlw8UNzaf3+TyaQk4/UkMgUimGOiU6k00yMwIAbgpMC9noE3qxgZk1Ewdm0DMChFjcgE3vxga08fcevcIaIoUNFAAxCQAV3kky0CRm93mcTUaiWrgAHD5NvUcaKfk4XJLpdSDAIukA=== – Jeremy Lakeman Mar 15 '21 at 02:49
  • @JeremyLakeman didn't mean it's optimized, edited question. It runs in about the same time as if it was written like the pseudo code. Thanks to dxiv though, his version is about 20% faster. – user14557971 Mar 15 '21 at 02:53
  • 1
    See https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array for probably the reason why it's faster, well done @dxiv – Charlieface Mar 15 '21 at 02:55
  • you should get something faster if you can skip the boxing. Avoid the `CompareTo` and go for specialized overloads. – Mario Vernari Mar 15 '21 at 02:57
  • @MarioVernari You are assuming there is boxing here, there is not. The function uses a nominal `Obj` as the type – Charlieface Mar 15 '21 at 02:58
  • @MarioVernari There is no boxing, `CompareTo(T other)` is from implementing `IComparable` but I didn't show it above for brevity. – user14557971 Mar 15 '21 at 03:00
  • my fault: I was tricked by reading `Obj`. The code clearly shows a different story. – Mario Vernari Mar 15 '21 at 03:02
  • @dxiv That looks very promising. Finally, OP has to try and profile it. – Ralf Kleberhoff Mar 15 '21 at 11:31
  • 1
    @user14557971 By any chance, is there a way to cut down on the millions of comparisons, e.g. by changing some algorithm to a better-performing one? This can often improve performance by huge factors, while the local comparator optimization might at best give a two- or three-fold improvement. – Ralf Kleberhoff Mar 15 '21 at 11:35
  • @dxiv if you want to add your comment as an answer so that I can accept it as answer. – user14557971 Mar 18 '21 at 18:10

2 Answers2

4

Very slightly shorter than dxiv's version but based on the same idea:

(this.m - other.m) + ((this.c - other.c) << 8)

So we first compare the ms but then override it by the comparison of cs, making use of the limited range of the ms.

Falk Hüffner
  • 4,942
  • 19
  • 25
  • (+1) This replaces 3 bitwise operations with 2 arithmetic ones, and will be slightly faster on virtually all modern CPUs. As a historical note, this was not always the case in the (far) past ([1](https://cs.stackexchange.com/questions/75811/why-is-addition-as-fast-as-bit-wise-operations-in-modern-processors), [2](https://stackoverflow.com/questions/15668718/why-were-bitwise-operations-slightly-faster-than-addition-subtraction-operations)). – dxiv Mar 18 '21 at 18:50
3

c can only be in the range [0-100]
m can only be in the range [0-200]

Given the ranges, each of c, m fit into a single (unsigned) byte, so they can be packed together into an integer cm = c * 256 + m, or cm = (c << 8) | m using only bitwise ops.

Since only the sign of the compare function matters (as clarified in a comment), two such integers can then be compared by direct subtraction.

int CompareTo(Obj other)  =>  ((_c << 8) | _m) - ((other._c << 8) | other._m)

The above is branchless and uses only bitwise/arithmetic operators, so it should fare well against nested comparisons for most distributions of c, m values.

dxiv
  • 16,984
  • 2
  • 27
  • 49