2

Now I try to improve my knowledge of pointers reading "Understanding and Using C Pointers" by Richard Reese.

Here's one code example from this book concerning realloc() function.

char* getLine(void) {
    const size_t sizeIncrement = 10;
    char* buffer = malloc(sizeIncrement);
    char* currentPosition = buffer;
    size_t maximumLength = sizeIncrement;
    size_t length = 0;
    int character;

    if(currentPosition == NULL) { return NULL; }

    while(1) {
        character = fgetc(stdin);

        if(character == '\n') { break; }

        if(++length >= maximumLength) {
            char *newBuffer = realloc(buffer, maximumLength += sizeIncrement);

            if(newBuffer == NULL) {
                free(buffer);
                return NULL;
            }

            currentPosition = newBuffer + (currentPosition - buffer);
            buffer = newBuffer;
        }

        *currentPosition++ = character;
    }

    *currentPosition = '\0';
    return buffer;
}

The main idea is to read all symbols into the buffer until we meet \n.

We don't know the total number of symbols to read so it's reasonable to use realloc() function to expand buffer periodically.

So, to expand buffer we use:

char *newBuffer = realloc(buffer, maximumLength += sizeIncrement);

In this case realloc() returns newBuffer pointer to the expanded buffer.

After that, if realloc() was invoked successfully, currentPosition is recalculated as:

currentPosition = newBuffer + (currentPosition - buffer);

QUESTION:

Is it valid to recalculate currentPosition in such way?

As I know, after realloc() invocation buffer pointer is invalidated. (See, for example, this). Any access to the buffer pointer leads to the undefined behaviour. So... where am I wrong?

Edgar Rokjān
  • 17,245
  • 4
  • 40
  • 67
  • Rokyan Is it indeed the code as it is written in the book? If so then it is a bad code and I am sure the book aslo is bad. – Vlad from Moscow Jan 27 '16 at 20:23
  • Well it is invalidated *if you deference it*, but the code does not. It just computes the new pointer. The previous pointer still retains its value. – Weather Vane Jan 27 '16 at 20:27
  • If you really want to improve your knowledge don't call a pointer =>> Array – Michi Jan 27 '16 at 20:27
  • @VladfromMoscow If you want you can check. May be I missed some blank lines... – Edgar Rokjān Jan 27 '16 at 20:30
  • I believe that you have copied the code correctly, Edgar. Pointer arithmetic is still allowed on the value of `buffer` even after the `realloc`. It's *dereferencing* it that leads to the land of undefined behavior. – David Hoelzer Jan 27 '16 at 20:31
  • OT, usually you could double the size when reallocating, instead of a constant increment. – Bob__ Jan 27 '16 at 20:35
  • @Bob__ This recommendation doesn't relate to the problem... – Edgar Rokjān Jan 27 '16 at 20:37
  • @EdgarRokyan yes, it's Off Topic, it's just one of the things I found... not so good, in that code. – Bob__ Jan 27 '16 at 20:45
  • 1
    [Related thread](http://stackoverflow.com/questions/33584843/is-fetching-the-value-of-an-invalid-pointer-undefined-or-implementation-defined) – M.M Jan 27 '16 at 21:34
  • the posted code already has the 'length' variable, containing the number of bytes so all the messing around seems like a contrived example just to illustrate using pointers. – user3629249 Jan 28 '16 at 19:18
  • @user3629249 Mmm, yes, of course. The main goal is to show `realloc()` in usage, but not to write a piece of super optimized code... – Edgar Rokjān Jan 28 '16 at 21:08

4 Answers4

6

This code causes undefined behaviour:

currentPosition = newBuffer + (currentPosition - buffer);

After passing a pointer to realloc, that pointer variable (and all other pointers based on that pointer) become indeterminate, which is the same status that an uninitialized variable has.

Reference: C11 6.2.4/2:

[...] The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

Then, doing pointer arithmetic on an invalid pointer causes undefined behaviour, C11 6.5.6/8:

When an expression that has integer type is added to or subtracted from a pointer, [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined

The pointer operand doesn't point to an object at that time. The object it used to point to has already been freed.

In fact, evaluating the pointer at all may cause undefined behaviour, since an indeterminate value may be a trap representation. (Imagine a system where loading a value into an address register also performs a hardware check that the address belongs to this process). Refs: C11 3.19.2, 6.2.6.1/5:

If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined


The correct way to write the code would have been:

if(++length >= maximumLength)
{
    size_t currentOffset = currentPosition - buffer;

    char *newBuffer = realloc(......
    // ...

    currentPosition = newBuffer + currentOffset;
    buffer = newBuffer;
}

(Personally I would use the offset the whole way , instead of currentPosition, to avoid this problem entirely)

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 2
    I think you're wrong there. You would be correct if the code *dereferenced* the pointer, but it does not. Since the value of `buffer` *cannot* change in the call (since it is call by value), the variable is guaranteed to have the original value of buffer. This allows for simple pointer arithmetic to determine what the correct offset is to add to the new `newBuffer` memory region. – David Hoelzer Jan 27 '16 at 20:35
  • 4
    @DavidHoelzer the C standard disagrees with you – M.M Jan 27 '16 at 20:35
  • Which section specifically? If that's true then the arithmetic must be done before `realloc`. It would also imply that somehow `buffer` itself is being modified during a call by value. – David Hoelzer Jan 27 '16 at 20:36
  • @DavidHoelzer added references – M.M Jan 27 '16 at 20:42
  • I guess the point is that while both `currentPosition` and `buffer` are invalidated their value didn't change (as far as i know) after the call to realloc, so their difference still can (but it shouldn't IMHO) represent the current offset. – Bob__ Jan 27 '16 at 20:55
  • @Bob__ "invalidated" means "value becomes indeterminate, and the pointer is invalid". Anything could happen really. – M.M Jan 27 '16 at 20:57
  • The arithmetic is perfectly valid, so how would the pointer be changed making it invalid? – Carey Gregory Jan 27 '16 at 20:57
  • 1
    @CareyGregory the arithmetic is not perfectly valid (see the second quote in my answer) – M.M Jan 27 '16 at 20:58
  • The only way it's not valid is if the value of the pointer changes. If that doesn't happen, the results of the arithmetic will be correct in all cases. I think you're misapplying a weakness in the wording of the standard. – Carey Gregory Jan 27 '16 at 20:59
  • @CareyGregory I think you're making assumptions based on experience with particular systems, assumptions which don't hold in the general case (and we do in fact have a standards document for just that reason). The wording in the standard is very clear here. Anyway, the value of the pointer does change (it goes from valid to invalid). – M.M Jan 27 '16 at 21:01
  • Has nothing to do with particular systems. I'm confident the value does not change on any known hardware platform. The fact that it no longer points to accessible memory does not mean its _value_ has changed. – Carey Gregory Jan 27 '16 at 21:02
  • @CareyGregory You overlook the fact that the specification was written by academics, and academics like to play with experimental hardware, hardware that is *not* known to you. So I agree with M.M's interpretation of the specification, while at the same time I agree with you that no sensible hardware would have a problem with the OP's code. – user3386109 Jan 27 '16 at 21:15
  • 3
    [Here](http://www.google.com/patents/US7966480) is a patent for hardware trapping in response to a register containing an invalid pointer value. [Here](http://www.lynx.com/using-the-microprocessor-mmu-for-software-protection-in-real-time-systems/) is a description of how a *MMU* can allow for the same pointer to be valid at one time, and invalid at a later time. Another useful feature that the C standard allows is for the compiler to set freed pointers to a known invalid representation (e.g. `0xCDCDCDCD`); so that if a program goes on to use a freed pointer then it is obvious what happened. – M.M Jan 27 '16 at 21:45
  • I hoped that this book would give me a deep understanding of pointers. In fact, I found a bug in the middle of this book... – Edgar Rokjān Jan 27 '16 at 21:47
  • @EdgarRokyan You have improved your understanding of pointers via this discussion though :) – M.M Jan 27 '16 at 21:48
  • The C standard guarantees that the value of `buffer` cannot and will not change regardless of what happens to the memory it points to. Although I can imagine a hardware design in which it would fail, I do not think that hardware exists or will exist within the lifespan of anyone's C code posted here. So, yes, I'll grant you that for academics the calculation violates the standard, but I don't think it will fail in anyone's lifetime here. ;-) – Carey Gregory Jan 27 '16 at 21:56
  • 2
    @CareyGregory the C standard doesn't guarantee that at all (see my first quote). For example the compiler could translate `free(p);` to a call to the `free` library function followed by setting `p` to null (or some other value that would be useful for debugging). – M.M Jan 27 '16 at 21:59
  • Explain to me how `free(p)` can change the value of `p` for the caller. – Carey Gregory Jan 27 '16 at 22:36
  • 1
    @CareyGregory I just did in my previous comment – M.M Jan 27 '16 at 22:49
  • I don't think that would ever happen since it would violate the pass-by-value standard. Anyway, I think we've worn this debate out. – Carey Gregory Jan 28 '16 at 00:29
  • Did you read Vlad's arguments against your answer? What do you think? – Edgar Rokjān Jan 28 '16 at 21:52
  • 1
    @EdgarRokyan yes, he is wrong. He assumes pointers are glorified integers of some sort, a common mistake. There's nothing more to discuss really, you can either read the C standard, or you can invent your own answers – M.M Jan 28 '16 at 21:57
4

Yes. Here's why.

currentPosition and buffer, as they are being used in the expression currentPosition = newBuffer + (currentPosition - buffer);, are simply being used for their arithmetic value. At no time after the realloc is buffer dereferenced.

When you call realloc you are correct that the pointer must no longer be relied on as a pointer into the memory region of the buffer. However, the actual address value in the pointer is not changed by the call.

David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
1

My five cents.:)

For starters I think that the program shown in the book is bad.

It does not check whether fgetc returns EOF.

And it does not return NULL as usual when the end of the file is encountered and no data has been read.

This drawback does not allow to use the function for example the following way

while ( ( buffer = getLine() ) != NULL )
{
    //...
    free( buffer );
}

Also there are too many variables. The same logic can be performed with less variables.

So I would write the function the following way

#include <stdlib.h>
#include <stdio.h>

char * getLine( void ) 
{
    const size_t SIZE_INCREMENT = 10;

    char  *buffer = NULL;
    size_t length = 0;

    int c = fgetc( stdin );

    if ( c != EOF )
    {        
        while ( 1 )
        {
            if ( length % SIZE_INCREMENT == 0 )
            {
                char *tmp = realloc( buffer, length + SIZE_INCREMENT );
                if ( !tmp )
                {
                    free( buffer );
                    buffer = NULL;
                    break;
                }
                else
                {
                    buffer = tmp;
                }
            }

            if ( c == EOF || c == '\n' ) break;

            buffer[length++] = c;

            c = fgetc( stdin );
        }            

        if ( buffer ) buffer[length] = '\0';
    }

    return buffer;
}

int main( void )
{
    char *s;

    while ( ( s = getLine() ) != NULL )
    {        
        puts( s );
        free( s );
    }

    return 0;
}

If to enter for example

This is first line
This is second line

then the output will echo the input

This is first line
This is second line

As for the discussion whether this statement

currentPosition = newBuffer + (currentPosition - buffer);

is well-defined then in my opinion it is well-defined.

According to the C Standard (6.5.6 Additive operators)

6 The result of the binary - operator is the difference resulting from the subtraction of the second operand from the first.

Though the object pointed to by the pointers is not already alive nevertheless the pointers contain valid values relative to the pointer arithmetic that refered elements of the same array. So if to apply the subtruction the result will be a valid value of elements between these two pointers that the non-alive array had. The subtraction operator just does the arithmetic using integer values and taking into account the type of the pointers like

( second_pointer - first_pointer ) / sizeof( *first_pointer )
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • @EdgarRokyan The pointer passed to the function is passed by value. The original pointer keeps its value. Nothing can occur with it. – Vlad from Moscow Jan 27 '16 at 22:24
  • Now I totally agree with @M.M answer. After `realloc()` invocation `buffer` pointer doesn't point to the real object. In general we can't use it even in subtraction cause it's not guaranteed to be valid. Do you agree? – Edgar Rokjān Jan 27 '16 at 22:34
  • @EdgarRokyan See my answer one more. This binary operator performs subtraction of the operands. The operands in this case have valid values for the operation because the both pointers refered elements of the same array. Nothing was occurred with these values. They are valid values for the subtraction. – Vlad from Moscow Jan 27 '16 at 22:39
  • @EdgarRokyan Otherwise you should think that for example 4 - 2 can equal to any value or have some indeterminate value.:) – Vlad from Moscow Jan 27 '16 at 22:42
  • Vlad, I want to say that there's no array when we do subtraction. It's well explained in M.M's answer. But it looks like you don't agree with him... – Edgar Rokjān Jan 28 '16 at 21:06
  • @EdgarRokyan The array entirely is not required when the operator is used. The array is required just to get valid values for the pointers. After that the operator is performed without any access to the array. And there is said about this in the standard. – Vlad from Moscow Jan 28 '16 at 21:12
  • Ok, but what's about first reference to standard in M.M's answer? It says that the value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime. – Edgar Rokjān Jan 28 '16 at 21:15
  • @EdgarRokyan Yes it means. But when the object pointed to by the pointer is not alive the memory occupied by the object can be not acceptable by the program. – Vlad from Moscow Jan 28 '16 at 21:28
  • And also it means according to the statement which we discuss that pointer can be not acceptable too... – Edgar Rokjān Jan 28 '16 at 21:36
  • @EdgarRokyan Why is not the pointer acceptable? It is a local variable that is acceptable. You can for example reassign it. – Vlad from Moscow Jan 28 '16 at 21:37
  • I think we understand phrase `pointer becomes indeterminate` differently. I suppose that it means that pointer can point to any memory address, you say that it'll save previous address. Obviously in both cases it won't point to a valid object. – Edgar Rokjān Jan 28 '16 at 21:42
  • @EdgarRokyan A variable that is a pointer is the same as any other variable. It keeps the value you assigned to the variable. So nothing will happen with the value stored in the pointer whether it early pointed to an object or now it does not point to any object. – Vlad from Moscow Jan 28 '16 at 21:46
  • @EdgarRokyan I think so.:) – Vlad from Moscow Jan 28 '16 at 21:50
0

The code uses a pattern which, on many platforms, would be naturally supported by any implementation which does not go out of its way to do otherwise. Unfortunately, some implementations will occasionally break such code in the name of "optimization", even though the value of any practical optimizations obtained thereby would pale in comparison to the efficiency loss that would often result from requiring programmers to use alternative approaches.

supercat
  • 77,689
  • 9
  • 166
  • 211