3

Consider the following code sample:

#include <iostream>
#include <string>

int main()
{
    std::string str("someString"); // length 10
    int num = -11;
    std::cout << num % str.length() << std::endl;
}

Running this code on http://cpp.sh, I get 5 as a result, while I was expecting it to be -1.

I know that this happens because the type of str.length() is size_t which is an implementation dependent unsigned, and because of the implicit type conversions that happen with binary operators that cause num to be converted from a signed int to an unsigned size_t (more here); this causes the negative value to become a positive one and messes up the result of the operation.

One could think of addressing the problem with an explicit cast to int:

num % (int)str.length()

This might work but it's not guaranteed, for instance in the case of a string with length larger than the maximum value of int. One could reduce the risk using a larger type, like long long, but what if size_t is unsigned long long? Same problem.

How would you address this problem in a portable and robust way?

Community
  • 1
  • 1
elnigno
  • 1,751
  • 14
  • 37
  • 1
    It's a remainder operator. The behaviour depends on which version of the standard you're using. – user207421 Jul 01 '16 at 13:27
  • C code written before 1999 and C++ code written before 2011 may behave inconsistently in this case, since negative division/remainders were poorly specified in older standards. Make sure to use modern compilers/standards. – Lundin Jul 01 '16 at 14:01
  • While size_t may be represented as long long, I doubt it will ever max out. Unless your platform supports exa-byte string length that is... Which I think is a reasonable restriction in any case. – Andreas Jul 01 '16 at 17:45

2 Answers2

3

Since C++11, you can just cast the result of length to std::string::difference_type.

To address "But what if the size is too big?":

That won't happen on 64 bit platforms and even if you are on a smaller one: When was the last time you actually had a string that took up more than half of total RAM? Unless you are doing really specific stuff (which you would know), using the difference_type is just fine; quit fighting ghosts.

Alternatively, just use int64_t, that's certainly big enough. (Though maybe looping over one on some 32 bit processors is slower than int32_t, I don't know. Won't matter for that single modulus operation though.)

(Fun fact: Even some prominent committee members consider littering the standard library with unsigned types a mistake, for reference see this panel at 9:50, 42:40, 1:02:50 )

Pre C++11, the sign of % with negative values was implementation defined, for well defined behavior, use std::div plus one of the casts described above.

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
  • Thanks for the insight about `difference_type`. If I understand correctly, by the standard, `size_t` is large enough to represent the length of the string, while `difference_type` is large enough to represent the largest possible difference between two iterators on the same string; since the maximum difference is the length of the string, then it should always be safe to cast `size_t` to `difference_type`. Correct? – elnigno Jul 03 '16 at 10:35
  • *"`difference_type` is large enough to represent the largest possible difference between two iterators on the same string;"* In reality this pretty much holds, but it's not guaranteed. On a 32 bit system, both `size_t` and the difference type will most likely be 32 bit integers, so if the string is like 2.5GB or bigger, the size won't fit in the signed one. – Baum mit Augen Jul 03 '16 at 13:00
3

We know that

-a % b == -(a % b)

So you could write something like this:

template<typename T, typename T2>
constexpr T safeModulo(T a, T2 b)
{
    return (a >= 0 ? 1 : -1) * static_cast<T>(std::llabs(a) % b);
}

This won't overflow in 99.98% of the cases, because consider this

safeModulo(num, str.length());

If std::size_t is implemented as an unsigned long long, then T2 -> unsigned long long and T -> int.

As pointed out in the comments, using std::llabs instead of std::abs is important, because if a is the smallest possible value of int, removing the sign will overflow. Promoting a to a long long just before won't result in this problem, as long long has a larger range of values.

Now static_cast<int>(std::llabs(a) % b) will always result in a value that is smaller than a, so casting it to int will never overflow/underflow. Even if a gets promoted to an unsigned long long, it doesn't matter because a is already "unsigned" from std::llabs(a), and so the value is unchanged (i.e. didn't overflow/underflow).

Because of the property stated above, if a is negative, multiply the result with -1 and you get the correct result.


The only case where it results in undefined behavior is when a is std::numeric_limits<long long>::min(), as removing the sign overflows a, resulting in undefined behavior. There is probably another way to implement the function, I'll think about it.

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
  • Thanks, this makes sense but won't this lead to undefined behaviour if a is `INT_MIN`? http://en.cppreference.com/w/cpp/numeric/math/abs – elnigno Jul 03 '16 at 10:31
  • @elnigno True, I completely forgot about that case. Promoting `a` to `long long` will result in defined behavior. Thanks :) – Rakete1111 Jul 03 '16 at 18:04