strcmp like interface for comparing numbers in C

Question

I want to make strcmp-like interface for comparing numbers, e.g., ncmp(x, y) that returns an int > 0 if x > y, 0 if x = y, < 0 if x < y in C (not C++).

Although I necessarily do not want to constrain the types, my main interest is to compare signed long ints and doubles. The 'interface' can be a macro as in tgmath.h, or it can be a (set of) functions. I want all pairs of signed long int and double to work; (signed long int, double) and (double, double) should work for instance.

What I'm currently using is the following macro:

#define ncmp(x, y) ((x) > (y)) - ((x) < (y))

Does this naive macro have any pitfalls? Is there a better, robust solution to compare numbers?

Any help would be greatly appreciated!

`c` supports something called _functions_ for this sort of thing ;) — 500 - Internal Server Error, Mar 31 '21 at 11:52
@stark `x - y` can overflow if `x` and `y` have opposite sign. Also, it only works for those unsigned types that get promoted to `int`. — Ian Abbott, Mar 31 '21 at 11:55
Some test cases for people proposing solutions: `ncmp(0u, 1u)`, `ncmp(MIN_INT, 1)`. — Paul Hankin, Mar 31 '21 at 11:57
Note also, that comparisons involving doubles can produce more than "equal", "less", "greater". It can also produce "not equal" when a NaN is involved. — Paul Hankin, Mar 31 '21 at 11:58
... and returning an `int` from comparing two `double` values can be problematical if the difference is less than 1. — Adrian Mole, Mar 31 '21 at 12:00
You could use generic selection, but the difficulty is catching all the types and that the parameters might not have the same type, which would make it very messy. Well, you could use `(x) - (y)` as the controlling expression for the generic selection expression to reduce the selection to a single set of types. — Ian Abbott, Mar 31 '21 at 12:06
`ncmp(-1, 0u)` is a potential pitfall of a macro-based solution (including the one in the question). — Paul Hankin, Mar 31 '21 at 12:15
People want to compare strings like numbers with predefined operators in C++ and you want the opposite. Why - schoolwork? — i486, Mar 31 '21 at 12:59
@i486 Definitely not a school work. I'm implementing a DSL that deals with polynomials with coefficients supporting both integers & floats, and needed this functionality while normalizing a system of equalities & inequalities. Just a hobby project. — Jay Lee, Mar 31 '21 at 13:02

score 2 · Accepted Answer · edited Jul 30 '21 at 07:26

2

I want all pairs of signed long int and double to work;

Since C11, code can use _Generic to steer function selection based on type.

int cmp_long(long x, long y) {
  return (x > y) - (x < y);
}  

int cmp_double(double x, double y) {
  return (x > y) - (x < y);
}  

#define cmp(X, Y) _Generic((X) - (Y), \
    long: cmp_long((X),(Y)), \
    double: cmp_double((X),(Y)) \
)

This approach does not well detect cases where X, Y are of different types as the (X) - (Y) uses the common type between them @Ian Abbott. Yet it is a start.

int main(void) {
  printf("%d\n", cmp(1L, 2L));
  printf("%d\n", cmp(3.0, 4.0));
}

A more complex 2-stage _Generic could be made to distinguish long, double and double, long. I'll leave that part to OP.

The compare function would be like the one below. The tricky part is to not lose precision of long (it might be 64-bit) when comparing to double.

// TBD: handling of NANs
#define DBL_LONG_MAX_P1 ((LONG_MAX/2 + 1)*2.0)
int cmp_long_double(long x, double y) {
  // These 2 compares are expected to be exact - no rounding
  if (y >= DBL_LONG_MAX_P1) return -1;
  if (y < (double)LONG_MIN) return 1;

  // (long) y is now in range of `long`.  (Aside from NANs)
  long y_long = (long) y; // Lose the fraction
  if (y_long > x) return -1;
  if (y_long < x) return 1;
 
  // Still equal, so look at fraction
  double whole;
  double fraction = modf(y, &whole);
  if (fraction > 0.0) return -1;
  if (fraction < 0.0) return 1;
  return 0;
}

Simplifications may exist.

When double encodes all long exactly or when long double exists and encodes all long exactly, it's easiest to convert both the long and double to the common type and compare.

edited Jul 30 '21 at 07:26

Toby Speight

27,591
48
66
103

answered Mar 31 '21 at 12:32

chux - Reinstate Monica

143,097
13
135
256

I suggest using `(X) - (Y)` (or `(X) + (Y)`) as the controlling expression of the generic selection. – Ian Abbott Mar 31 '21 at 12:36
You (just) beat me to it! :-) – Ian Abbott Mar 31 '21 at 12:37
C99 should be C11. – Ian Abbott Mar 31 '21 at 12:38
1

Thank you for the comprehensive answer. `cmp_long_double` was something I was looking for. – Jay Lee Mar 31 '21 at 13:07
Is the purpose of the more complex 2-stage `_Generic` to correctly compare values that cannot be compared with the usual relational operators? – Ian Abbott Mar 31 '21 at 13:14
@IanAbbott Yes - to do a precise compare. As a `long` can be compared directly to a `double`, the role of `_Generic` it to bypass the usual conversions to the common type as information can get lost (rounding) and call crafted compare code. – chux - Reinstate Monica Mar 31 '21 at 13:21
What is the exact purpose of `((LONG_MAX/2 + 1)*2.0)`? I can see that it has to do with the ranges `long` and `double` can represent, but I'm not too familiar with the IEEE floating point rep... Why not just `LONG_MAX + 1.0`? – Jay Lee Mar 31 '21 at 13:23
1

The middle of the compare function converts the `double` to ` long` for precise compassion. The prior compares test if the `double` is near the `long` range: [LONG_MIN.9999... to LONG_MAX.999...] `LONG_MIN` is certainly some power-of=-2 (negated), so that compare is exact. Comparing `double`, LONG_MAX` is avoided as it may be inexact as `LONG_MAX` is a power-of-2 minus 1 and not exactly representable as `double`. `DBL_LONG_MAX_P1` is an exact power-of-2. – chux - Reinstate Monica Mar 31 '21 at 13:32
1

@JayLee `LONG_MAX + 1.0` first converts `LONG_MAX` to a `double`. If inexact, "the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner.". Then a 1.0 is added (possibly with no effect with 64-bit `long`). The _implementation-defined manner_ is avoided with `(LONG_MAX/2 + 1)*2.0` to form a `double` one more than `LONG_MAX`. – chux - Reinstate Monica Mar 31 '21 at 13:36
1

Exceeding the limitations of the relational operators seemed to me to be going beyond the call of duty. You deserve a medal! – Ian Abbott Mar 31 '21 at 13:37
@chux-ReinstateMonica The quote led me here: https://stackoverflow.com/q/66631288/5252984. Thank you for providing such insightful resources! – Jay Lee Mar 31 '21 at 13:42
1

@JayLee Note that the quote [there](https://stackoverflow.com/q/66631288/5252984) refers to floating point FP to narrower FP and the concern here is integer to FP. Both have implementation-defined aspects to be accounted. – chux - Reinstate Monica Mar 31 '21 at 13:47

Ian Abbott · Answer 2 · 2021-07-30T09:46:52.210

For this macro:

#define ncmp(x, y) ((x) > (y)) - ((x) < (y))

the main problems are:

It requires an additional set of parentheses in the expansion to form a primary expression.There are not enough parentheses in the expansion to turn it into a primary expression. It should be:
```
#define ncmp(x, y) (((x) > (y)) - ((x) < (y)))
```
It evaluates (x) and (y) twice, which could be a problem if the evaluation has side effects.

To avoid the problem of multiple evaluation, the macro expansion could use a generic selection expression to call a different function for each type being compared.

Note 1: generic selection was added in the 2011 version of the C standard (C11).)

Here is an example macro using generic selection. It may need to be extended to support additional types:

#define ncmp(x, y) _Generic((x) < (y), \
    int: ncmp_si,                      \
    unsigned: ncmp_ui,                 \
    long: ncmp_sli,                    \
    unsigned long: ncmp_uli,           \
    long long: ncmp_slli,              \
    unsigned long long: ncmp_ulli,     \
    float: ncmp_f,                     \
    double: ncmp_d,                    \
    long double: ncmp_ld               \
    )((x), (y))

Note 2: The controlling expression of the generic selection ((x) < (y)) is not evaluated, but its type is used to select a corresponding generic association expression (if any).

Note 3: The choice of < in the controlling expression does not matter much, but it does at least check that (x) and (y) have an ordered relationship. For arithmetic operands, the type of the controlling expression is the result of the usual arithmetic conversions.

Note 4: Due to the usual arithmetic conversions done to the operands of < in the controlling expression, there is no need to add cases for integer types below the rank of int.

Note 5: It is possible to add a default: generic association. For example, it could be defined to fall back to using the less safe multiple evaluation method as follows:

#define ncmp(x, y) _Generic((x) < (y),         \
    int: ncmp_si((x), (y)),                    \
    unsigned: ncmp_ui((x), (y)),               \
    long: ncmp_sli((x), (y)),                  \
    unsigned long: ncmp_uli((x), (y)),         \
    long long: ncmp_slli((x), (y)),            \
    unsigned long long: ncmp_ulli((x), (y)),   \
    float: ncmp_f((x), (y)),                   \
    double: ncmp_d((x), (y)),                  \
    long double: ncmp_ld((x), (y)),            \
    default: ((x) > (y)) - ((x) < (y))         \
    )

but I chose to leave it up to the programmer to add the missing cases.

It is necessary to define the functions used by each of the generic associations above. To save a bit of typing, a helper macro could be defined to define them:

#define MK_NCMP_(suf, T) \
static inline int ncmp_##suf(T x, T y) { return (x > y) - (x < y); }

MK_NCMP_(si, int)
MK_NCMP_(ui, unsigned)
MK_NCMP_(sli, long)
MK_NCMP_(uli, unsigned long)
MK_NCMP_(slli, long long)
MK_NCMP_(ulli, unsigned long long)
MK_NCMP_(f, float)
MK_NCMP_(d, double)
MK_NCMP_(ld, long double)

strcmp like interface for comparing numbers in C

2 Answers2