-1

Disclaimer: This question asks how "str literal" + "str literal" works

For how 'a' + 'b' or '9' - '0' = 9 ('character' + 'character') works :


Question:

To everyone who's more familiar with C, thanks for reading

(compiled with clang, standard=C11)

Example:

(trying to print __FILE__ without its ".c" extension)

printf("%s\n", __FILE__); returns filename.c

printf("%.*s\n", (int)(".c" - __FILE__), __FILE__); returns filename

1. How does C typecast string/string literals to int? Are whitespaces ignored?

  • What does the value of an (int)"string" represent?

Another example:

(int)("word" - "rd") = 6273
(int)("rd" - "word") = -6273
(int)("word" - "  rd") = -5
(int)("  rd" - "word") = -5

Why does (int)(".c" - __FILE__) even work?

3. Is the printf function above actually working?

4. Is there a string equivalent to 'a' + 1 = 'b' ?

Thanks in advance guys!



irrelevant guessing:

1 Why does (int)(".c" - __FILE__) even work?

guessing its

  some value of (first?) pointer to ".c" 
- some value of (first?) pointer to __FILE__ string literal 

2. What does the value of (int)"string" actually represent?

  • Why does (int)(".c" - __FILE__) even work?

idk but here's another example:

printf("%i", (int)".c");
printf("%i", (int)__FILE__);
printf("%i", (int)(".c" - __FILE__));
printf("%i", (int)(__FILE__ - ".c"));
printf("%./*i", (int)(".c" - __FILE__), (int)__FILE__);
printf("%./*i", (int)(".c" - __FILE__), (int)("c" - __FILE__));

output
---------------------
(int) ".c"= 4357629
(int) __FILE__= 4357620
(int) (".c" - __FILE__) : 9
(int) (__FILE__ - ".c"): -9
(int with precision specified) __FILE__ : 004357620
(int with precision specified) (".c" - __FILE__): 000000009 
$

3. Is printf actually working?

Assuming it does, probably:

printf("%.*s",(int)(".c" - __FILE__), __FILE__)
    width = (int)(".c" - __FILE__)   
    specifier/str = __FILE__

printf prints out __FILE__ as a string of width (".c" - __FILE__) (two characters less)

cdpp
  • 152
  • 2
  • 9
  • 1
    A string is an array of characters. In most contexts, when an array is used as an lvalue it's converted to a pointer to the first element. Converting a pointer to `int` is implementation-defined, but typically it just returns the pointer's address as an integer. – Barmar Nov 18 '17 at 08:26
  • Oh, so `(int)"string"` is saying the same thing as `(base 10) &(str[0])`? – cdpp Nov 18 '17 at 08:39
  • 1
    Undefined behaviour everywhere. The pointer arithmetic is not even defined for objects that are not in the same array. In fact, the runtime can abort due to range checking violation – Antti Haapala -- Слава Україні Nov 18 '17 at 09:14
  • Also, in the question, you say the question is for "how `"str" + "str"` works" - no it isn't, `"str" + "str"` doesn't work at all (just try). – Antti Haapala -- Слава Україні Nov 18 '17 at 09:17

2 Answers2

1

A string (with double quotes), e.g. "abc", is converted to a pointer when used as an expression. If you add an integer to a pointer, you get a new pointer. If you subtract two compatible pointers, you get an integer.

A character (with single quotes), e.g. 'x', is just an integer. You can add them, subtract them, etc. just like any other integer.

__FILE__ expands to a string, so ".c" - __FILE__ is the integer that results from subtracting the two pointers.

If you cast a pointer to an integer, you get an integer.

Keep in mind that some of the expressions involving pointers may not be well-defined, but the data types are.

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41
  • Thanks for the quick reply! Really love how simple your answer is – cdpp Nov 18 '17 at 08:46
  • Just to clarify, `new pointer = integer + pointer` is basically `type* b = type* a[0 + i]` ? – cdpp Nov 18 '17 at 09:01
  • Succinct but does not explain why `printf("%.*s\n", (int)(".c" - __FILE__), __FILE__);` prints `"filename"` though. – Clifford Nov 18 '17 at 09:10
  • "*Strings ... are just pointers*" No, no, no. They are arrays. Arrays are not pointers! – alk Nov 18 '17 at 09:39
  • @alk You misquoted me. The exact quote was "strings ... are just pointers *when used in an expression*" And, yes, array names, too, are just pointers *when used in an expression*. – Tom Karzes Nov 18 '17 at 10:04
  • @Clifford I didn't specifically address that case, but the pointer subtraction is just an integer (the cast isn't even necessary). Presumably its value was large enough that all of `__FILE__` was printed (or, if negative, it's treated as if no precision was given). – Tom Karzes Nov 18 '17 at 10:10
  • `sizeof array` is an expression, isn't it? – alk Nov 18 '17 at 10:11
  • `char a[42] = {0}; char b[42] = a;` the latter is an expression as well I assume. – alk Nov 18 '17 at 10:12
  • @alk In that context, `array` is *not* an expression. It's a variable. It would be more precise to say "when used *as* an expression". I'll fix the wording. – Tom Karzes Nov 18 '17 at 10:13
  • @alk Your last example is invalid. It produces an "invalid initializer" error. – Tom Karzes Nov 18 '17 at 10:16
  • @TomKarzes : The precision was given by the first argument replacing the * precision placeholder. My point is that the address of ".c" is _inside_ of "filename.c" not an independent string. That is what you did not explain. It does _not_ print the whole filename; read carefully; that is the point of the question I think. – Clifford Nov 18 '17 at 11:38
  • @Clifford Oh, you're right, I didn't read that part of the question - I was basically just explaining the basics of pointer arithmetic with strings. It's surprising that the `".c"` was reusing part of the filename string. I just tested it with `gcc`, and it did the same thing, but only if I specified `-O`. – Tom Karzes Nov 18 '17 at 12:02
  • @TomKarzes: This is be intention. It clearly shows an array is not a pointer as shown by this example it cannot be initialised by a pointer. I, BTW, put this example before I read your other comment: https://stackoverflow.com/questions/47364005/how-does-str-str-in-c-work-how-are-they-stored/47364147?noredirect=1#comment81682503_47364147 – alk Nov 18 '17 at 17:32
1

There are three things happening in your examples.

Firstly in C pointer arithmetic rules are such that two pointers may be subtracted to yield the difference in address between the two pointers. So for example:

char test[2] ;
char* t1 = &test[0] ;
char* t2 = &test[1] ;
ptrdiff_t d = t2 - t1  ; // d == 1

Where ptrdiff_t is an integer type capable of holding the difference between any two pointers. Casting to int is potentially erroneous, as for a 32 bit int it will span only 2Gb - as such the error is unlikely.

The second thing happening is that a string literal such as "word" when used in an expression is a pointer to the string content.

And the third thing happening is that your linker has performed duplicate string elimination. It has exhaustively searched your code for string literals that are identical and replaced them with a single pointer. This part of your observation is implementation dependent and may not hold for all toolchains, or even the same toolchain with different compiler/linker settings.

The built-in macro __FILE__ is a string literal containing the name of the sourcefile in which it is instantiated. In the example:

(int)(".c" - __FILE__)

__FILE__ == "filename.c" and the linker finds the duplicate ".c" within that (it must be at the end because the nul terminator must match). So the difference between the two pointer values is 8 ( the length of "filename"). So the statement:

printf("%.*s\n", (int)(".c" - __FILE__), __FILE__);

prints the first 8 characters of the string "filename.c" which is "filename".

Something more complicated is happening with:

(int)("word" - "rd") = 6273
(int)("rd" - "word") = -6273
(int)("word" - "  rd") = -5
(int)("  rd" - "word") = -5 

In the first and second cases, you might from the first __FILE__ example expect -2 and 2 respectively, however that might occur except that in this case the linker may have matched the "rd" with the end of the " rd" string rather then with the end of "word". The linker behaviour is implementation defined and non-deterministic. The results are likely to vary for example if you removed the third and fourth expressions so that the string literals no longer existed. Strings from entirely different link modules may be referenced.

The point is that you cannot rely on this entirely undefined/implementation behaviour (the string elimination that is - the pointner arithmetic, and literal string pointer behaviour is well defined). It is interesting as an examination of linker behaviour, but is not useful as a programming technique.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • "*a string literal ... is a pointer to the string content*" I know you probably know, but this simplifies it too much. So no, Strings are arrays. Array are not pointers. – alk Nov 18 '17 at 09:45
  • @Clifford 1. `...two ptrs may be subtracted to yield the difference in address btwn the two ptrs.` Thaaank you. 2. `..._duplicate string elimination_...replaced them with a single ptr... and is implementation dependent`+ `(it must be at the end because the nul terminator must match)` Thanks for including these bits. Wonder if this _dup string elimination_ is an example of an optimization feature? 3. `It is not useful as a programming technique.` Agreed. Like you said, impractical but still pretty interesting. Thanks for demystifying! – cdpp Nov 18 '17 at 10:04
  • @alk : The value used in the expression evaluation is a pointer. Arrays are not first-class types in C and cannot be used in expressions - they degrade to pointers for evaluation. I think I was clear in the answer without getting into unnecessary detail. – Clifford Nov 18 '17 at 11:27
  • 1
    @cdpp : Yes string elimination is a _linker optimisation_. In most toolchains it can be controlled by linker command switches. – Clifford Nov 18 '17 at 11:29