2

Suppose I wanted to initialize an array of characters, with the 'x' character, using a loop, should I use an int or a size_t?

char *pch = (char *) malloc(100);
for (size_t s = 0; s < 100; s++)
    pch[s] = 'x';

Note: I know about memset and other 'tricks', my question is just about int VS size_t.

Bite Bytes
  • 1,455
  • 8
  • 24
  • 2
    Possible duplicate of [What is size\_t in C?](https://stackoverflow.com/questions/2550774/what-is-size-t-in-c) – Philipp Jul 03 '17 at 16:56
  • 2
    Drop the cast and use `size_t` but un this case consider `memset` – Adriano Repetti Jul 03 '17 at 16:57
  • My understanding -- and I could be completely wrong about this; I'd love to see a quote from the standard -- is that the offset argument to `[]` is taken as a `ptrdiff_t`, so you should use that (rather than a `size_t` or `int`). For this exact case, `memset` is preferred (of course). – Joshua Green Jul 03 '17 at 17:35
  • @JoshuaGreen `ptrdiff_t` is used to hold the value of the difference of two pointers – Bite Bytes Jul 03 '17 at 17:38
  • Apparently ([see C11 p6.5.2.1](http://port70.net/~nsz/c/c11/n1570.html#6.5.2.1)) you should use `int`. – pmg Jul 03 '17 at 18:26
  • 1
    @pmg C11 p6.5.2.1 does not promote using an index of `int`. – chux - Reinstate Monica Jul 03 '17 at 19:26
  • @pmg, `ptrdiff_t` is _an_ integer type. [According to cppreference.com](http://en.cppreference.com/w/c/types/ptrdiff_t), `ptrdiff_t` is used for array indexing. That may not be correct, of course; I've personally found it very difficult to track down definitive statements about this. – Joshua Green Jul 04 '17 at 01:46

3 Answers3

5

Since the variable is indexing an array, using size_t is more appropriate as an array index can't (normally) be negative.

Also, since array indexes are often compared against the result of a strlen call or sizeof operator, both of which yield a size_t, using an int could generate warnings for signed/unsigned comparisons.

However, if you're initializing all bytes of a buffer to a given value, you should instead use memset:

memset(pch, 'x', 100);
dbush
  • 205,898
  • 23
  • 218
  • 273
  • So whenever I wanna navigate through memory elements I go for a `size_t`? – Bite Bytes Jul 03 '17 at 17:03
  • 1
    @BiteBytes If you're *indexing an array*, yes because an array shouldn't be indexed by a negative number. – dbush Jul 03 '17 at 17:05
  • So why using `int` is so famous, besides, look at @PSkocik answer – Bite Bytes Jul 03 '17 at 17:08
  • @BiteBytes People tend to use `int` as a default type in many cases, even when isn't not necessarily the best type to use. If you know you're going to use a negative index, for example if you have a pointer to the middle of an array, then you can use a signed type like `int`, although those cases tend to be rare. – dbush Jul 03 '17 at 17:18
3

size_t is more appropriate for indexing arrays as it is guaranteed to have the correct range for all array sizes. Yet there are some issues to consider:

  • size_t is an unsigned type, so you must be careful to only compute positive values in tests.

    For example iterating downwards this way will not work:

    for (size_t i = size - 1; i >= 0; i--) {
         array[i] = 0;
    }
    

    This naive approach has 2 problems: it iterates even if size is 0 and it loops forever because i >= 0 is always true.

    A better approach is this:

    for (size_t i = size; i-- > 0;) {
         array[i] = 0;
    }
    
  • size_t may be larger than int, so the format %d in printf is inappropriate for index values of this type. The standard conversion specifier is %zd, but it is not supported on many windows systems. You may need to cast the size_t value as (int) or use %llu and cast the size_t value as unsigned long long.

  • Some people consider size_t inelegant, maybe because of the _ or just because they are not used to it.

For these and other reasons, it is quite common to see index variables defined as int. While it is OK for small arrays and short loops, it will cause hard to find bugs in code where these assumptions end up failing.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
2

The generated code is likely to be identical, especially for such a simple case. With more complex math, signed types (if you can safely use them) are somewhat more optimizable because the compiler is allowed to assume they never overflow. With signed types, you also won't get unpleasant surprises if you decide to compare against a negative index.

So if you sum it up:

        very_large_arrays negative_comparison_safe maybe_faster
int     no                yes                      yes 
size_t  yes               no                       no

it looks like int could be preferable with definitely-small (<2^15 or perhaps 2^31 if your architecture targets guarantee that) ranges unless you can think of another criterion where size_t wins.

The advantage of size_t is that it will definitely work for any array no matter the size as long as you aren't comparing against negative indices. (This may be much more important than negative-comparison-safety and potential speed gains from undefined overflow).

ssize_t (== signed size_t) combines the best of both, unless you need every last bit of size_t (you definitely don't on a 64 bit machine).

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 3
    "unless you can think of another criterion where size_t wins." Mixing signed and unsigned math too often leads to problems and should be avoided. With such compiler warnings enable, then `some_int < strlen(p)` and `some_int < sizeof ch_arr` trigger warnings that need attention. Note `ssize_t` is not in standard C nor its library/ – chux - Reinstate Monica Jul 03 '17 at 17:58