5

So I did the following test:

char* a = "test";
char* b = "test";
char* c = "test\0";

And now the questions:

1) Is it guaranteed that a==b? I know I'm comparing addresses. This is not meant to compare the strings, but whether identical string literals are stored in a single memory location

2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?

3) Is an extra \0 appended at the end of c, even though it already contains one?

I didn't want to ask 3 different questions for this because they seem somehow related, sorry 'bout that.

Note: The tag is correct, I'm interested in C++. (although please specify if the behavior is different for C)

AMCoder
  • 773
  • 1
  • 6
  • 15
  • 7
    Should be `char const* a = ...`. – R. Martinho Fernandes May 24 '12 at 17:22
  • a and b have the same value, but that doesn't necessarily mean they are the _same_ string. – Hunter McMillen May 24 '12 at 17:24
  • @HunterMcMillen - actually that's exactly what it would mean. – Edward Strange May 24 '12 at 17:26
  • In C++ it does not matter whether the literals are the same (constant folding) or not, the code would not compile, as literals are of type `const char[]` and you cannot initialize a non-const `char*` from it. – David Rodríguez - dribeas May 24 '12 at 17:35
  • @CrazyEddie Same string to me means they occupy the same location in memory, and since b doesn't point to a; I don't see how that is possible – Hunter McMillen May 24 '12 at 17:40
  • @HunterMcMillen - if a and b have the same value then they point at the same location. Has nothing to do with whether b points at a or visa-versa. If b did indeed point at a and they were the same value, that would be an odd condition indeed and you'd have to work to make it happen. – Edward Strange May 24 '12 at 19:42
  • @DavidRodriguez: yes, you can assign a `const char[]` string literal to a non-const `char*` pointer. The C++ standard specifically allows that exception in order to maintain backwards compatibility with C code. – Remy Lebeau May 25 '12 at 04:50

7 Answers7

18

Is it guaranteed that a==b?

No. But it is allowed by §2.14.5/12:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

And as you can see from that last sentence using char* instead of char const* is a recipe for trouble (and your compiler should be rejecting it; make sure you have warnings enabled and high conformance levels selected).

Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?

No, they're not required to be referring to same array of characters. One has five elements, the other six. An implementation could store the two in overlapping storage, but that's not required.

Is an extra \0 appended at the end of c, even though it already contains one?

Yes.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
  • It might also be worth explaining that `"test"` is a literal `char const[5]` in the context of why A & B *may* overlap but C will not. – AJG85 May 24 '12 at 17:37
  • Thanks. One more question. If I use `strcmp` to compare `a` and `c`, will it say they are equal? Also, is it allowed for `a==c`? – AMCoder May 24 '12 at 17:38
  • @AJG85 why won't `c` overlap? The standard explicitly allows it (the quote is in the answer and it doesn't limit to identical literals), and a program can't tell the difference without invoking undefined behaviour. – R. Martinho Fernandes May 24 '12 at 17:40
  • 1
    @AMCoder Yes, `strcmp` will always consider all three equal because it always treats the first null character as a terminator. And yes, I think `a==c` is allowed. – R. Martinho Fernandes May 24 '12 at 17:41
  • I suppose that's possible. `a==c` is false in my implementation but that's not a guarantee either. – AJG85 May 24 '12 at 17:59
6

1 - absolutely not. a might == b though if the compiler chooses to share the same static string.

2 - because they are NOT referring to the same string

3 - yes.

The behavior is no different between C and C++ here except that C++ compilers should reject the assignment to non-const char*.

Edward Strange
  • 40,307
  • 7
  • 73
  • 125
4

1) Is it guaranteed that a==b?

It is not. Note that you are comparing addresses and they could be pointing to different locations. Most smart compilers would fold this duplicate literal constant, so the pointers may compare equal, but again its not guaranteed by the standard.

2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?

You are trying to compare pointers, they point to different memory locations. Even if you were comparing the content of such pointers, they are still unequal (see next question).

3) Is an extra \0 appended at the end of c, even though it already contains one?

Yes, there is.

K-ballo
  • 80,396
  • 20
  • 159
  • 169
3

First note that this should be const char* as that's what string literals decay to.

  1. Both create arrays initialized with 't' 'e' 's' 't' folowed by a '\0' (length = 5). Comparing for equality will only tell you if they both start with the same pointer, not if they have the same contents (though logically, the two ideas follow each other).
  2. A isn't equal to C because the same rules apply, a = 't' 'e' 's' 't' '\0' and b = 't' 'e' 's' 't' '\0' '\0'
  3. Yes, the compiler always does it and you shouldn't expicitly do in if you're making a string like this. If you however crated an array and manually populated it, you need to ensure you add the \0.

Note that for my #3, const char[] = "Hello World" would also automatically get the \0 at the end, I was refferring to manually filling the array, not having the compiler work it out.

John Humphreys
  • 37,047
  • 37
  • 155
  • 255
  • I don't think `a` and `c` are guaranteed unequal. The standard allows the implementation to store literals in overlapping storage, and programs can't tell the difference without invoking UB. – R. Martinho Fernandes May 24 '12 at 17:38
  • I actually agree with you having checked that, it was an intersting point I hadn't heard. My point would still be that the compiler would definitely place two null terminators in case 2 though which makes the operations inequal, even if reading the strings & checking their pointers would be equal. Also, if you recorded the length of the string you created for c, you could safely read past the strlen() by 1 and you couldn't for a (unless they're in the same location as you said, but you wouldn't be able to guarantee that). – John Humphreys May 24 '12 at 17:41
  • *Even if the compiler stored them in the same location*, `a[5]` (which tries to read the value at the same location as `c[5]`) would invoke undefined behaviour, because `a` points to an array of size 5. It doesn't matter that there happens to be another array laying around in the same place. – R. Martinho Fernandes May 24 '12 at 17:46
2

The problem here is you're mixing the concepts of pointer and textual equivalence.

When you say a == b or a == c you are asking if the pointers involved point to the same physical address. The test has nothing to do with the textual contents of the pointers.

To get textual equivalence you should use strcmp

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • Note that `strcmp` will stop at the *first* zero, so the strings `a` and `c` will compare equal even though one is longer than the other. – Mark Ransom May 24 '12 at 17:30
  • @MarkRansom it's equivalent length in the textual sense though which is what strcmp will consider. Perhaps I should expand my answer into the 3 realms which may be important here: pointer equivalence, textual equivalence and memory equivalence – JaredPar May 24 '12 at 17:32
  • -1 sorry but I have a pointer comparison there because I want to compare pointers. That's what the question is about. – AMCoder May 24 '12 at 17:36
0

If you are doing pointer comparisons than a != b, b != c, and c != a. Unless the compiler is smart enough to notice that your first two strings are the same.

If you do a strcmp(str, str) then all your strings will come back as matches.

I am not sure if the compiler will add an additional null termination to c, but I would guess that it would.

jlunavtgrad
  • 997
  • 1
  • 11
  • 21
0

As has been said a few times in other answers, you are comparing pointers. However, I would add that strcmp(b,c) should be true, because it stops checking at the first \0.

Andrew Buss
  • 1,532
  • 2
  • 14
  • 23