How to get negative remainder with remainder operator on size_t?

Question

Consider the following code sample:

#include <iostream>
#include <string>

int main()
{
    std::string str("someString"); // length 10
    int num = -11;
    std::cout << num % str.length() << std::endl;
}

Running this code on http://cpp.sh, I get 5 as a result, while I was expecting it to be -1.

I know that this happens because the type of str.length() is size_t which is an implementation dependent unsigned, and because of the implicit type conversions that happen with binary operators that cause num to be converted from a signed int to an unsigned size_t (more here); this causes the negative value to become a positive one and messes up the result of the operation.

One could think of addressing the problem with an explicit cast to int:

num % (int)str.length()

This might work but it's not guaranteed, for instance in the case of a string with length larger than the maximum value of int. One could reduce the risk using a larger type, like long long, but what if size_t is unsigned long long? Same problem.

How would you address this problem in a portable and robust way?

It's a remainder operator. The behaviour depends on which version of the standard you're using. — user207421, Jul 01 '16 at 13:27
C code written before 1999 and C++ code written before 2011 may behave inconsistently in this case, since negative division/remainders were poorly specified in older standards. Make sure to use modern compilers/standards. — Lundin, Jul 01 '16 at 14:01
While size_t may be represented as long long, I doubt it will ever max out. Unless your platform supports exa-byte string length that is... Which I think is a reasonable restriction in any case. — Andreas, Jul 01 '16 at 17:45

score 3 · Accepted Answer · edited Jul 01 '16 at 13:58

3

Since C++11, you can just cast the result of length to std::string::difference_type.

To address "But what if the size is too big?":

That won't happen on 64 bit platforms and even if you are on a smaller one: When was the last time you actually had a string that took up more than half of total RAM? Unless you are doing really specific stuff (which you would know), using the difference_type is just fine; quit fighting ghosts.

Alternatively, just use int64_t, that's certainly big enough. (Though maybe looping over one on some 32 bit processors is slower than int32_t, I don't know. Won't matter for that single modulus operation though.)

(Fun fact: Even some prominent committee members consider littering the standard library with unsigned types a mistake, for reference see this panel at 9:50, 42:40, 1:02:50 )

Pre C++11, the sign of % with negative values was implementation defined, for well defined behavior, use std::div plus one of the casts described above.

edited Jul 01 '16 at 13:58

HolyBlackCat

78,603
9
131
207

answered Jul 01 '16 at 13:29

Baum mit Augen

49,044
25
144
182

Thanks for the insight about `difference_type`. If I understand correctly, by the standard, `size_t` is large enough to represent the length of the string, while `difference_type` is large enough to represent the largest possible difference between two iterators on the same string; since the maximum difference is the length of the string, then it should always be safe to cast `size_t` to `difference_type`. Correct? – elnigno Jul 03 '16 at 10:35
*"`difference_type` is large enough to represent the largest possible difference between two iterators on the same string;"* In reality this pretty much holds, but it's not guaranteed. On a 32 bit system, both `size_t` and the difference type will most likely be 32 bit integers, so if the string is like 2.5GB or bigger, the size won't fit in the signed one. – Baum mit Augen Jul 03 '16 at 13:00

Rakete1111 · Answer 2 · 2016-07-03T18:13:02.707

We know that

-a % b == -(a % b)

So you could write something like this:

template<typename T, typename T2>
constexpr T safeModulo(T a, T2 b)
{
    return (a >= 0 ? 1 : -1) * static_cast<T>(std::llabs(a) % b);
}

This won't overflow in 99.98% of the cases, because consider this

safeModulo(num, str.length());

If std::size_t is implemented as an unsigned long long, then T2 -> unsigned long long and T -> int.

As pointed out in the comments, using std::llabs instead of std::abs is important, because if a is the smallest possible value of int, removing the sign will overflow. Promoting a to a long long just before won't result in this problem, as long long has a larger range of values.

Now static_cast<int>(std::llabs(a) % b) will always result in a value that is smaller than a, so casting it to int will never overflow/underflow. Even if a gets promoted to an unsigned long long, it doesn't matter because a is already "unsigned" from std::llabs(a), and so the value is unchanged (i.e. didn't overflow/underflow).

Because of the property stated above, if a is negative, multiply the result with -1 and you get the correct result.

The only case where it results in undefined behavior is when a is std::numeric_limits<long long>::min(), as removing the sign overflows a, resulting in undefined behavior. There is probably another way to implement the function, I'll think about it.

Thanks, this makes sense but won't this lead to undefined behaviour if a is `INT_MIN`? http://en.cppreference.com/w/cpp/numeric/math/abs — elnigno, Jul 03 '16 at 10:31
@elnigno True, I completely forgot about that case. Promoting `a` to `long long` will result in defined behavior. Thanks :) — Rakete1111, Jul 03 '16 at 18:04

How to get negative remainder with remainder operator on size_t?

2 Answers2