1

This might be expected but I'm just curious as to how/why this happens.

When i try to use a char * declared locally char * foo = "\xFF\xFF..." as an integer it seg faults. But if I use malloc it works perfectly well when i try to access it. Why does this happen?

#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  unsigned char *buf = malloc(16);
  memcpy(buf, "\x00\x00\x00\x00\x00\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF", 16);
  //unsigned char *buf = "\x00\x00\x00\x00\x00\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"; // seg faults if you sue this instead

  uint64_t *k  = (uint64_t *) buf;
  uint64_t *k2 = (uint64_t *) (buf + 8);
  uint64_t i  = 1000000000;

  printf("-k =%" PRIu64 "\n", *k);
  printf("-k2=%" PRIu64 "\n", *k2);

  printf("Iter * %" PRIu64  "\n", i);
  for (uint64_t c = 0; c < i; ++c)
    {
      *k  += 1;
      *k2 -= 1;
    }

  printf("-k =%" PRIu64 "\n", *k);
  printf("-k2=%" PRIu64 "\n", *k2);

  return 0;

}

Output:

easytiger $ gcc -std=c99 tar.c -Wall -O2 ; time ./a.out
-k =0
-k2=18446744073709551615
Iter * 1000000000
-k =1000000000
-k2=18446744072709551615
dbush
  • 205,898
  • 23
  • 218
  • 273
easytiger
  • 514
  • 5
  • 15
  • 4
    What on earth are you trying to do....?!?! And why is your C question tagged [tag:c++]? – Lightness Races in Orbit Jul 14 '15 at 20:50
  • Even if the string literal were writable (which it is not), the code may fail on certain architectures because of alignment. – jxh Jul 14 '15 at 20:51
  • A mistake. And I was just seeing what would happen. It isn't for any code i'm writing. – easytiger Jul 14 '15 at 20:56
  • 2
    Just use `unsigned char buf[] = "\x00\x00\x00\x00\x00\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF";` ... no need to `malloc()` these few bytes if they're only needed locally. –  Jul 14 '15 at 20:59

3 Answers3

5

String literals are immutable. You may not modify the data stored there. Ever.

Even in C nowadays we make this clear and diagnosable by lobbing a const into the pointer type.

C++ actually requires it.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • +1 for this, which when combined with your comment to dbush below, is the most thorough and nuanced answer here, rather than oversimplifying. – underscore_d Jul 14 '15 at 20:54
  • I think this answer gives a more accurate overview of the mechanism behind this: http://stackoverflow.com/questions/2589949/c-string-literals-where-do-they-go (namely .rodata) – easytiger Jul 16 '15 at 12:31
  • @easytiger: I was very deliberate in _not_ talking about implementation details / mechanism of some specific compiler. This question is about C, a high-level programming language that abstracts away such details by definition. When you start obsessing over implementation details like that answer does, you make assumptions and sweeping statements that cannot be guaranteed to hold, particularly in the face of so-called "optimisations" (where "optimisations" just means "the compiler doing its job in an optimal manner that the author failed to consider"). :) – Lightness Races in Orbit Jul 16 '15 at 12:35
3

There is no guarantee that string literals will be stored in a writable memory page. This means that the *k += 1 operations in the for loop will likely try to write to read-only memory. Memory allocated by malloc, on the other hand, will always be writable.

tsandy
  • 911
  • 4
  • 10
3

For a definition of this form:

unsigned char *buf = "some string";

buf points to a static string which is stored in a read-only portion of memory. When you try to write to it, you get a segfault.

By using malloc, the memory pointed to by buf is writeable.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 4
    It may not be stored in read-only memory. And whether it is or not doesn't even really matter: the language prohibits you from mutating a string literal's constituent characters, if you want your program to have well-defined semantics. Rationalising about it beyond that is pointless. – Lightness Races in Orbit Jul 14 '15 at 20:52
  • Thanks, it's that simple then. Lightness Races in Orbit makes a good point too. I guess the malloc version is not treated at all like a string literal. – easytiger Jul 14 '15 at 20:53
  • 1
    Because it _isn't_ a literal. (The fact that it is `memcpy`d from one is not relevant.) – underscore_d Jul 14 '15 at 20:55
  • Yepper. You can even start talking about the odds of the literal being optimised away and whatever and blah blah blah but there's no need. :) – Lightness Races in Orbit Jul 14 '15 at 20:57
  • @LightnessRacesinOrbit The language doesn't prohibit it but declares it "undefined behavior". That's why the code compiles -- the reason it segfaults at runtime is technically INDEED that it is placed in read-only memory. This of course is ok because a program could do ANYTHING on UB. –  Jul 14 '15 at 20:57
  • @FelixPalmen: I was careful with my words, and you should be careful with reading them. I said the language prohibits doing this "if you want your program to have well-defined semantics". I'm just bored of saying "it's UB" over and over again ;) – Lightness Races in Orbit Jul 14 '15 at 20:58
  • 1
    Well, that's nitpicking of course, but talking about semantics, I'd expect prohibitions to be enforced ;) –  Jul 14 '15 at 21:02
  • Undefined behaviour != a prohibition. That's why it has its own term. Note LRiO used "prohibit" informally, with the qualifier _if_ you want well-defined results. You can still do it, but don't expect predictable behaviour across compilers. In many cases, compilers don't strictly protect against UB because accommodating a tonne of edge cases would be an inefficient use of dev time, user compile time, and possibly runtime. – underscore_d Jul 14 '15 at 21:35