C pointer arithmetics

Question

This is the code that I don't understand, it simply reverse a string.

#include <stdio.h>

void strrev(char *p)
{
  char *q = p;
  while(q && *q) ++q;
  for(--q; p < q; ++p, --q)
    *p = *p ^ *q,
    *q = *p ^ *q,
    *p = *p ^ *q;
}

int main(int argc, char **argv)
{
  do {
    printf("%s ",  argv[argc-1]); strrev(argv[argc-1]);
    printf("%s\n", argv[argc-1]);
  } while(--argc);

  return 0;
}

The only piece of code that I don't understand is this one: while(q && *q) ++q;, it is used to find the eos. Isn't it the same as while(*q) ++q;, since q is never going to be 0? How can the author of the code be sure that q or *q are going to be 0?

This code comes from this question: How do you reverse a string in place in C or C++?

It protects the function from deferring a NULL pointer argument. — Paulo Bu, Jan 21 '14 at 15:42
Note that this program fails to comply with the spec, since the "namespace" `str*` for functions is reserved. — unwind, Jan 21 '14 at 15:44
That code is simply appalling. You'd be much better off throwing it away. — David Heffernan, Jan 21 '14 at 15:46

Eric Lippert · Accepted Answer · 2014-01-21T16:14:54.600

David Heffernan's comment is correct. That code is appalling.

The point of the code you are asking about is to skip dereferencing q if it is null. Therefore the author of the code believes that q could be null. Under what circumstances can q be null? The most obvious is: if p is null.

So let's see what the code does when p is null.

void strrev(char *p) // Assumption: p is null
{
  char *q = p; // now q is null
  while(q && *q) ++q; // The loop is skipped, so q and p are both still null.
  for(--q;

So the first thing we do is decrement q, which is null. Likely this will wrap around and we will get as a result q containing the largest possible pointer.

    p < q;

Since null is smaller than everything except null, and q is no longer null, this is true. We enter the loop...

    ++p, --q)
    *p = *p ^ *q,

And promptly dereference null.

    *q = *p ^ *q,
    *p = *p ^ *q;
}

Incidentally, at Coverity we refer to this as a "forward null defect" -- that is, the pattern where a code path indicates that a value is expected to be potentially null, and then later on the same code path assumes that it is not null. It is extremely common.

OK, so this code is completely broken if given null as an argument. Are there other ways that it is broken? What happens if we give it an empty string?

void strrev(char *p) // Assumption: *p is 0
{
  char *q = p; // *q is 0
  while(q && *q) ++q; // The second condition is not met so the body is skipped.
  for(--q; // q now points *before valid memory*.
       p < q  // And we compare an invalid pointer to a valid one.

Do we have a guarantee in C that says that when you subtract one from a pointer to valid memory, and then compare that pointer to another pointer, that the comparison is sensible? Because this strikes me as incredibly dangerous. I do not know the C standard well enough to say whether this is undefined behaviour or not.

Moreover, this code uses the terrible "swap two chars with xor" trick. Why on earth would anyone do this? It generates larger, slower machine code and is harder to read, understand and maintain. If you want to swap two things, swap them.

It also uses the comma operator to put multiple statements in a single statement so as to avoid the horror of braces around the body of the for. What is the purpose of this oddity? The purpose of code isn't to show how many operators you know, it is first and foremost to communicate to the reader of the code.

The function also modifies its formal parameter, which makes it hard to debug.

+1 Hell of a code review. _appalling_ seems a compliment after I read this :) — Paulo Bu, Jan 21 '14 at 16:45
What C guarantees about comparison of pointers is that you can safely do so in a meaningful manner, so long as both are pointers to the same object, or pointers to members of the same aggregate object. As a special case, pointers to one past the last element of an array are considered part of the array for comparison purposes, but must never actually be dereferenced. A pointer to one before the beginning of an array is Undefined Behavior. — This isn't my real name, Jan 23 '14 at 03:59
For the exact details of what C has to say about adding, subtracting, and comparing pointers, see N1570, §6.5.6 Additive operators, paragraph 8 on pdf page 111 (labeled as page 93), §6.5.8 Relational operators, paragraphs 4 and 5 on pdf pages 113/114 (labeled as pages 95/96), and §6.5.9 Equality operators, paragraphs 5 and 6. — This isn't my real name, Jan 23 '14 at 04:06
@ElchononEdelson: That is what I suspected; thanks for the confirmation and references. — Eric Lippert, Jan 23 '14 at 06:01
*"Likely this will wrap around"* - Yes, but it's worth pointing out that it's undefined behavior. — klutt, Jan 25 '21 at 13:24
*"do we have a guarantee"* - Nope? UB https://stackoverflow.com/a/65649290/6699433 — klutt, Jan 25 '21 at 13:33

Wojtek Surowka · Answer 2 · 2015-05-05T09:07:58.107

5

The code

while(q && *q)

is a shorthand for

while(q != NULL && *q != '\0')

so you are testing if q (at the beginning equal to p) is not NULL. It means that the function called with NULL argument will not crash in this while loop. (But it will still crash in the second loop).

edited May 05 '15 at 09:07

answered Jan 21 '14 at 15:45

Wojtek Surowka

20,535
4
44
51

1

OP is asking whether he can use `while(*q)` or not. – haccks Jan 21 '14 at 15:47
2

No, the question was whether while (q && *q) and while(*q) are the same. – Wojtek Surowka Jan 21 '14 at 15:48
OK. Did this answer the question by saying `while(q && *q) is a shorthand for while(q != NULL && *q != '\0')`? – haccks Jan 21 '14 at 15:49

haccks · Answer 3 · 2014-01-21T17:48:43.967

Isn't it the same as while(*q) ++q;, since q is never going to be 0?

while(q && *q) is used to ensure that q is not NULL and *q is not NUL. Using while(*q) is also legal if q does not points to NULL.

void string_rev(char *p)
{
   char *q = p;

   if(q == NULL)
       return;

   while(*q) ++q;
   for(--q; p < q; ++p, --q)
       *p = *p ^ *q,
       *q = *p ^ *q,
       *p = *p ^ *q;

}

How can the author of the code be sure that q or *q are going to be 0?

while(q && *q), q could be a NULL pointer if p points to the NULL and loop terminates before entering the loop body. Otherwise through out the operation it can't be a NULL pointer. Now the loop termination depends on *q. Once q reaches the end of the string, *q becomes 0 and loop terminates.

C pointer arithmetics

3 Answers3

Linked