For loop running infinitely with string length in C++

Question

I am very surprised with this behavior of for loop :

program 1 :

#include<bits/stdc++.h>
using namespace std;
int main()
{
    string s1,s2;
    cin>>s1>>s2;
    for(int i=0;i<(s1.length()-s2.length()+1);i++)
    {
        cout<<"Hello\n";
    }
}

After giving input : s1 = "ab" , s2 = "abcdef"

This for loop of program 1 is running infinitely and printing "Hello" infinite times.

Whereas , Program 2 (below) is working fine for both same input of string s1 and s2.

program 2 :

#include<bits/stdc++.h>
using namespace std;
int main()
{
    string s1,s2;
    cin>>s1>>s2;
    int len = (s1.length()-s2.length()+1);
    for(int i=0;i<len;i++)
    {
        cout<<"Hello\n";
    }
}

Why is the for loop of program 1 running infinite times?

Hint: `std::string::length()` yields an `unsigned` value (`size_t` specifically). — πάντα ῥεῖ, Dec 15 '18 at 07:50
your hint seems fine..but can you please elaborate "unsigned meaning here" in few words? — Vishal Srivastav, Dec 15 '18 at 07:54
[1](https://stackoverflow.com/questions/2061245/how-to-subtract-two-unsigned-ints-with-wrap-around-or-overflow) [2](https://stackoverflow.com/questions/28779803/is-subtracting-larger-unsigned-value-from-smaller-in-c-undefined-behaviour) — πάντα ῥεῖ, Dec 15 '18 at 07:57
It doesn't run infinitely, it just runs for a long time, and you run out of patience. Let it go overnight and come back in the morning. Or maybe in a couple of days. — Pete Becker, Dec 15 '18 at 13:52

yeputons · Accepted Answer · 2018-12-15T14:26:48.730

In your example, s1.length() is evaluated to 2u (i.e. 2, but in an unsigned type), s2.length() is evaluated to 6u, s1.length() - s2.length() is most probably evaluated to 4294967292u (because there is no -4 in unsigned types), and s1.length() - s2.length() + 1 is evaluated to 4294967293u.

.length() return size_t in C++, which is unsigned value. Subtracting an unsigned value from another unsigned value yields an unsigned value, e.g. 1u - 2u may result in 4294967295.

When mixing signed and unsigned values (e.g. s.length() - 1 or i < s.length()), the signed value is converted to an unsigned, e.g. -1 > 1u is typically true because -1 is converted to 4294967295. Modern compilers will warn you about such type of comparisons if you enable warnings.

Knowing that, you may expect that your loop is running for 4 billion iterations, but that's not necessarily true, because i is a signed int, and if it's 32-bit (most probably), it cannot become bigger than 2147483647. And the moment your program increases it from 2147483647, signed overflow happens, which is undefined behavior in C++. So your loop may very well run infinitely.

I suspect you're doing competitive programming. My recommendation for competitive programming would be to always cast .length() to int whenever you want to calculation anything. You can create a macro like this:

#define sz(x) ((int)(x).size())

and then write sz(s) instead of s.length() everywhere to avoid such bugs.

However, that approach is highly frowned upon in any area of programming where code has to live longer than few hours. E.g. in the industry or open-source. For such cases, use explicit static_cast<int>(s.length())/static_cast<ssize_t>(s.length()) every time you need it. Or, even better, ask about during code review to get specific recommendations concerning your code, there are lots of possible caveats, see comments below for some examples.

I'd word *'that's not necessarily true'*. As signed overflow is UB, it *might* overflow to `0` (or anything else), much more likely, it overflows to `INT_MIN` and then restarts counting up to -1. If then promoted to (unsigned) size_t, it *will* be large enough to meet stop condition, even if size of int is smaller than size of size_t (but if so, there'll be a huge gap in between `(size_t)INT_MAX` and `(size_t)INT_MIN`...). So the *most likely* behaviour actually *is* running 4 billion times... — Aconcagua, Dec 15 '18 at 08:55
I do not agree on *always* casting to int! If I use sizes as are, I'd stay with the type they come with. Only if calculating *differences*, I'd cast - but then the *result* of subtraction; if values are large enough (possibly a matter of theory only, but still) there might be signed overflow otherwise (compare to `INT_MAX - (int)UINT_MAX`), and I'd cast to `ssize_t` to be sure not to narrow the type and possibly lose the most significant bits. — Aconcagua, Dec 15 '18 at 09:09
To be fully safe: `static_cast(s1.length - s2.length)`. It might appear unrealistic with 32-bit int (two vectors with around 2 billion entries each), but with 16-bit int (there are micro controllers out there that have!) and 32-bit size_t (don't know if any of forementioned ones actually has), the scenario of signed integer overflow gets more realistic. (Sidenote: cannot happen with ssize_t; if it did, we'd occupy more memory than addressable...) — Aconcagua, Dec 15 '18 at 09:55
Re: "the signed value is **casted** to an unsigned", the right word here is **converted**. A cast is something you write in your source code to tell the compiler to do a conversion. — Pete Becker, Dec 15 '18 at 13:54

score 1 · Answer 2 · answered Dec 15 '18 at 07:54

I haven't had a chance to test this yet so I can't say for sure, but I strongly suspect that this is related to the fact that string::length() returns a size_t, which is an unsigned type. Unsigned types wrap around to the maximum value if they become negative, so 2-6+1=-3 which becomes 2^32-3 when interpreted as 32 bit unsigned. This causes your loop to iterate billions of times, hence seemingly not terminating. Whereas in your second program you are explicitly converting to a signed int so the result is -3 as expected.

For loop running infinitely with string length in C++

2 Answers2