What is returned in this function?

Question

If I interprete this correctly, it takes as inputs two (long) integers, creates an array, and subtracts the array and an integer, but I thought that I could not subtract array and integers.

What does this function actually return?

int *ivector (long nl, long nh)
/* allocate an int vector with subscript range v[nl..nh] */
{
  int *retval;

  retval = malloc(sizeof(int)*(nh-nl+1));

  return retval - nl;
}

@mattiav27 The function has undefined behavior provided that nl is not equal to 0.:) — Vlad from Moscow, Oct 25 '19 at 11:48
You return a pointer to `retval[-nl]`. Which isn't really a valid pointer. — Some programmer dude, Oct 25 '19 at 11:48
This is a trick for simulating an array with a base index other than 0. Unfortunately it's not portable; it's well-defined only if the base you want to use is *negative* (and then only if it's a negative number whose absolute value is less than the size of the array). In particular, it's not portable to use this trick to simulate a 1-based array, which presumably is the usual wish. See also [question 6.17](http://c-faq.com/aryptr/non0based.html) in the [C FAQ list](http://c-faq.com/). — Steve Summit, Oct 25 '19 at 11:49
The comment says it all: _"vector with subscript range v[nl..nh]"_ but unfortunately, a pointer outside the bounds of the array is undefined in C, even when the indexing (dereferencing) later will be within the bounds of the array. — Paul Ogilvie, Oct 25 '19 at 11:58
Rephrasing my original comment: the `ivector ` function as written is valid only if `nl <= 0`. — Steve Summit, Oct 25 '19 at 12:06
Also this is quite evil because even where it works, it gives you a `malloc`'d pointer that you can't `free`, not without knowing the internals of this function. — The Vee, Oct 25 '19 at 13:18
@kiranBiradar Learn the C Standard: " If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overﬂow; otherwise, the behavior is undeﬁned." Pay attention to the word undefined. — Vlad from Moscow, Jan 31 '20 at 12:16

Steve Summit · Accepted Answer · 2019-10-25T13:25:37.663

Before exploring the behavior of this ivector() function, let's review some basic facts about arrays and pointers in C.

Consider the code

int a[10];
for(i = 0; i < 10; i++)
    a[i] = 100 + i;

This results in an array in memory which we can think of like this:

    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
 a: | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
       0     1     2     3     4     5     6     7     8     9

Suppose we now say

int *ip = a;

Due to the correspondence between arrays and pointers in C, this is equivalent to saying

int *ip = &a[0];

In any case, we end up with a pointer pointing at the first cell of a, like this:

    +-----+
ip: |  *  |
    +--|--+
       |
       v
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
 a: | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

Now, pointer arithmetic: When you add an integer to a pointer, you "move" the pointer so that it points to the next element in an underlying array. Make sure you understand all the different ways in which this code prints the number 102:

int *ip2 = ip + 2;
printf("%d %d %d %d\n", *(ip + 2), ip[2], *ip2, ip2[0]);

(If you don't understand how all four expressions *(ip+2), ip[2], *ip2, and ip2[0] evaluate to the number 102, please read about this or ask. It's another facet of the "correspondence between arrays and pointers", and it's fundamental to our understanding of the ivector function.)

Pointer subtraction works, too: the call

printf("%d %d\n", *(ip2 - 1), ip2[-1]);

prints 101, two slightly different ways.

Now, let's look at the ivector() function. It's trying to help us simulate arrays that don't necessarily start at 0. If we call

int a2 = ivector(0, 9);
for(i = 0; i <= 9; i++) a2[i] = 100 + i;

we'll end up with an array almost exactly like we had before:

    +-----+
a2: |  *  |
    +--|--+
       |
       v
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

The only difference is that the array itself has no name: it's an anonymous region of memory we got by calling malloc.

Now suppose we call

int a3 = ivector(-5, 5);
for(i = -5; i <= 5; i++) a3[i] = 100 + i;

Now we end up with an 11-element "array" which we can think of as looking like this:

    +-----+
a3: |  *-----------------------------+
    +-----+                          |
                                     v
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
    | 95  | 96  | 97  | 98  | 99  | 100 | 101 | 102 | 103 | 104 | 105 |
    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      -5    -4    -3    -2    -1     0     1     2     3     4     5

Note that we can talk about a3[0], a3[3], a3[-2], etc., just as if this were a regular array with a lower bound of -5. The key to this is that subtraction at the end of ivector you were asking about:

return retval - nl;

This doesn't subtract anything from the values of an array, or anything; it's pointer arithmetic again, subtracting nl from the pointer value retval. For the call ivector(-5, 5), this translates to

return retval - -5;

which of course is equivalent to

return retval + 5;

so we got a pointer 5 elements in to the allocated region.

Now suppose we call

int *a4 = ivector(1, 10);
for(i = 1; i <= 10; i++) a4[i] = 100 + i;

This is where it all breaks down. The intent is that we end up with a picture like this:

    +-----+
a4: |  *  |
    +--|--+
       |
       v
          +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
          | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 |
          +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
             1     2     3     4     5     6     7     8     9    10

But there's a pretty obvious problem: a4 doesn't actually point into the allocated array.

Based on the way pointer arithmetic works, and they way it's traditionally been implemented by straightforward compilers for straightforward computer architectures, you can convince yourself that this code "ought to" work anyway, and that you'd be able to access a4[1], a4[2], ... up to a4[10]. There'd be horrible problems if you tried to access a4[0], of course, but that's okay, you're not supposed to do that, because a4 is a 1-based array.

Unfortunately, this last fragment of code is not guaranteed to work. Pointer arithmetic is not defined if you compute a pointer that points "outside" of an array (either an actual array you declared, or an array-like block of memory you got by calling malloc). If you try to compute such a pointer, the behavior is undefined, even if you don't ever try to access the memory that the out-of-bounds pointer points at. So most knowledgeable C programmers will advise you not to write code like ivector (or if you do, to call it only for nl <= 0... but of course that pretty much defeats the purpose).

What is returned in this function?

1 Answers1