1

The following cython script, results in comparison between signed and unsigned integer expressions warning.

%%cython
# distutils: language = c++

from libcpp.vector cimport vector as cpp_vec
import numpy as np

cdef cpp_vec[long] a  = np.array([1,2,3], np.int)
cdef Py_ssize_t i

for i in range(a.size()):        # Using a.size() as parameter of range() causes the warning
    pass

Is this warning necessary? If so, why? Also, is it possible to silence these unsigned vs signed comparison warnings?

Also, why only the first for loop results in the warning?

%%cython
cdef:
    ssize_t i
    unsigned long m = 10
    unsigned int n = 10
    long o = 10

for i in range(m):
    pass
for i in range(n):
    pass
for i in range(o):
    pass
Manish Goel
  • 843
  • 1
  • 8
  • 21
  • What do you mean by „cython script results in warning“? Do you see the warning while generating c code, while compiling or while loading/running the resulting module? – ead May 10 '19 at 16:39
  • I would say during compiling. However, I am not exactly sure how to differentiate between generating C code and compile time warnings. There are no warnings while running the script. – Manish Goel May 10 '19 at 22:16

1 Answers1

1

This is a warning from your g++-compiler, and as every compiler-warning should be taken seriously.

The for-loop of your Cython code is transformed to the following/similar cpp-code:

 std::vector<long>::size_type __pyx_t_7;
 Py_ssize_t __pyx_t_8;
 __pyx_t_7 = __pyx_v_xxxx_a.size();
 for (__pyx_t_8 = 0; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {
     ....
 }

The problem is, that std::vector<long>::size_type is 64bit unsigned integer ona 64-bit system, and Py_ssize_t a 64bit signed integer.

C++ uses implicit conversion for built-in types, in order to be able to evaluate __pyx_t_8 < __pyx_t_7, the rules can be found for example in this SO-post.

There are multiple reason, why this warning makes sense - the rules are quite complicated and experience has shown, that programmers often handle them wrong. For example -1<1U evaluates to false (see live), but

signed char a =-1;
unsigned char b = 1;
std::cout<<(a<b)<<"\n";

prints 1, i.e. (a<b) evaluates to true (see live). Such quirks can easily leads to hard-to-find bugs.

Even more, historically, before the standard was established, different compilers handled those conversions differently - when switching to a different compiler, it was a nice thing to see all the places where the behavior could be changed - but this not really important nowdays.

There are different strategies to avoid this warning.

1. Go with the flow:

Isn't easier to make the i a size_t instead of Py_ssize_t, i.e cdef size_t i?

2. Cast and check:

You can explicitly cast and then check, whether the assumptions are OK, e.g.

...
cdef Py_ssize_t i

cdef Py_ssize_t n = <Py_ssize_t>(a.size())
if n<0 :
    raise ValueError("too big");

for i in range(n):
...

The above becomes fast quite cumbersome, so one usually extract this into a utility-function.

3. Just cast:

One could ask "how probable it is that a vector has more than 2^63 entries?" and just skip the checking:

...
for i in range(<Py_ssize_t>(a.size())):        
    print(i) 

and then... see the code fail 30 years later:)

4. Disable compiler warning:

One also could pass -Wno-sign-compare-option to the compiler via extra_compile_args in setup-file or -c=-Wno-sign-compare in IPython, which would turn off the warning in the whole code. It is probably safer to use the "Just cast"-solution, which mutes the warning only at this one place and not everywhere.


The warning is not issued if signed integer is of larger size than the unsigned integer: In this case the unsigned integer is converted to larger signed integer and fits into positive range of the larger type.

For example ssize_t has 64bit and unsigned int has 32bit, thus 32bit-unsigned bit is converted to 64bit-ssize_t before the comparison - the will be no overflow, because positive numbers up to 63 can be represented by `ssize_t´.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Since, the warning is caused because of range(), I don't think strategy 1 would work. – Manish Goel May 13 '19 at 08:48
  • @ManishGoel `range(N)` will be translated by Cython to a simple for-loop (and has nothing to do with Python's range) and `N` is used as upper bound without changing its type. Just try it out. – ead May 13 '19 at 08:52
  • Yes, you are right. Using `ssize_t` works. But, `range (or the C `for` loop) seems to be having trouble with unsigned long. I have edited the question with the test case. – Manish Goel May 13 '19 at 10:01
  • @ManishGoel I have added an explanation for `ssize_t` vs `unsigned int` (because I think it makes the answer better), but you should read about chameleon questions. https://meta.stackexchange.com/questions/43478/exit-strategies-for-chameleon-questions – ead May 13 '19 at 11:26