2

OK, so: we all know that generally the use of const_cast<>() anywhere is so bad it’s practically a programming war crime. So this is a hypothetical question about how bad it might be, exactly, in a specific case.

To wit: I ran across some code that did something like this:

std::string temporary = "/tmp/directory-XXXXX";
const char* dtemp = ::mkdtemp(const_cast<char*>(temporary.c_str()));
/// `temporary` is unused hereafter

… now, I have run across numerous descriptions about how to get writeable access to the underlying buffer of a std::string instance (q.v. https://stackoverflow.com/a/15863513/298171 for example) – all of them have the caveat that yes, these methods aren’t guaranteed to work by any C++ standard, but in practice they all do.

With this in mind, I am just curious on how using const_cast<char*>(string.c_str()) compares to other known methods (e.g. the aforementioned &string[0], &c)… I ask because the code in which I found this method in use seems to work fine in practice, and I thought I’d see what the experts thought before I attempt the inevitable const_cast<>()-free rewrite.

Community
  • 1
  • 1
fish2000
  • 4,289
  • 2
  • 37
  • 76
  • Use `std::vector` instead of `std::string`. You have the writable buffer without all of the casting. – PaulMcKenzie Dec 07 '15 at 05:29
  • Not what I am asking – I can think of a bunch of ways to do it that don’t have the problem. I am curious about the nature of what makes the problem problematic. – fish2000 Dec 07 '15 at 05:30
  • 1
    Btw: A correct implementation would be this: `char temporary[] = "/tmp/directory-XXXXX"; char* dtemp = mkdtemp(temporary);` Since this creates an array on the stack, which is initialized with a copy of the string literal, you have all right to modify that copy, including passing it to `mkdtemp()`. True, no shiny C++ stuff used here, but it's shorter, simpler, and more correct than the version you ran across :-) – cmaster - reinstate monica Dec 07 '15 at 21:25
  • @cmaster absolutely yes – while that would be saner _and_ less circuitous, the snippet I posted is a serious oversimplification of the actual production code, which naturally is messy enough to make such a straightforward edit unstraightforwardly doable. But you are certainly correct about doing a static assignment to a writeable stack-based temporary as ideal fodder for `mkdtemp()` usage, indeed. – fish2000 Dec 08 '15 at 17:18

1 Answers1

3
  • const cannot be enforced at hardware level because in practice, in non-hypothetical environment, you can set read-only attribute only to a full 4K memory page and there are huge pages on the way, which drastically reduce CPU's lookup misses in the TLB.

  • const doesn't affect code generation like __restrict from C99 does. In fact, const, roughly speaking, means "poison all write attempts to this data, I'd like to protect my invariants here"

Since std::string is a mutable string, its underlying buffer cannot be allocated in read-only memory. So const_cast<> shouldn't cause program crash here unless you're going to change some bytes outside of underlying buffer's bounds or trying to delete, free() or realloc() something. However, altering of chars in the buffer may be classified as invariant violation. Because you don't use std::string instance after that and simply throw it away this shouldn't provoke program crash unless some particular std::string implementation decide to check its invariants' integrity before destruction and force a crash if some of these are broken. Because such check couldn't be done in less than O(N) time and std::string is a performance critical class, it is unlikely to be done by anyone.

Another issue may come from Copy-on-Write strategy. So, by modifying the buffer directly you may break some other std::string's instance which shares the buffer with your string. But few years ago majority of C++ experts came to conclusion that COW is too fragile and too slow especially in multi-threaded environments, so modern C++ libraries shouldn't use it and instead adhere to using move construction where possible and avoiding heap traffic for small length strings where applicable.

Minor Threat
  • 2,025
  • 1
  • 18
  • 32
  • Huh wow yeah – that does explain why it works in this case (passing it to `mkdtemp()` which replaces characters but doesn’t reallocate anything, as far as I know). – fish2000 Dec 07 '15 at 05:59
  • …or changing a character to `'\0'`. – Emil Laine Dec 07 '15 at 06:02
  • 3
    Well, I think this is a bit misleading. It's not permitted to cast away the `const` *and then write to the buffer through that pointer*. You can only read. To get write access to the string contents, use `&s[0]`. – M.M Dec 07 '15 at 06:06
  • @M.M Right, that was my assumption – but like I said, it seems to work OK in that snippet, so is that a compiler fluke (I am using the latest `clang++` FYI) or something else I am not aware of that is allowing both compilation and execution to go ahead here? – fish2000 Dec 07 '15 at 06:09
  • 3
    @fish2000 it causes undefined behaviour, which may include appearing to work as intended. Or to put it another way, compilers may or may not support it (and you don't get any warning if they don't) – M.M Dec 07 '15 at 06:19
  • @M.M Ahaaaa yes the dreaded UB – thank you for clarifying that. – fish2000 Dec 07 '15 at 06:21
  • 1
    a) `const` cannot be enforced at hardware level b) `const` doesn't affect code generation like `__restrict` from C99. In fact, `const`, roughly speaking, means "poison all write attempts to this stuff, this hopefully protects my invariants". – Minor Threat Dec 07 '15 at 06:24
  • @MinorThreat … so does that mean that `const` is, at root, a compiler directive and nothing else? – fish2000 Dec 07 '15 at 06:49
  • @MinorThreat Nice! I truly appreciate COW-related details. Would upvote again if I could, thanks Dr. Threat. – fish2000 Dec 07 '15 at 14:51
  • N.B. What @MinorThreat is talking about w/r/t shared buffers will be more relevant once we get `std::experimental::string_view` finalized – q.v. http://en.cppreference.com/w/cpp/experimental/basic_string_view – fish2000 Dec 07 '15 at 14:54
  • 1
    About checking invariants in the destructor: `std::string::size()` is defined to be of `O(1)` complexity, so all implementations need to keep either the size or a pointer to the end of the string around. That information is enough to do a sanity check of the null-byte at the end of the string in constant time. So an implementation *could* force a crash if the terminating byte is overwritten. – cmaster - reinstate monica Dec 07 '15 at 21:16
  • @cmaster I do believe @MinorThreat was talking hypothetically about a putative `std::string` implementation crashing on destruction, should its invariant internals appear altered in some detectable fashion – however our respective actual `std::string` internals work, they benefit no one if they can’t be destructed in a way that is a) unflinchingly parsimoniously performant, and b) guaranteed `noexcept` … two qualities essential for `std::string` to be ubiquitously, generally, usefully safe across the board. Checking the trailing `NUL` doesn’t check integrity and makes the destructor dangerous. – fish2000 Dec 08 '15 at 17:35
  • 1
    @fish2000 I see it this way: The trailing `NUL` byte is put there implicitly by the implementation, and cannot be removed by legal user code. Hence, if the `NUL` byte is overwritten, the program has *already* entered UB. Aborting the process is a perfectly permissible course of action in such a case, and the check is cheap. True, that's pretty much of a language-lawyer point of view, but I've come to the conclusion that it's better to listen to the language-lawyers when it comes to UB. There are too many cases where code that used to work well broke miserably because a compiler proved UB. – cmaster - reinstate monica Dec 08 '15 at 22:11
  • 1
    In general you should strive to write code that "works by design", not "works by accident". Casting away `const` here or going into undefined behavior land puts you squarely in the "works by accident" mode. Later, the compiler is updated and changes the response to undefined behavior (they can do that) and you recompile your code and suddenly it is crashing somewhere weird on occasion. Just avoid undefined behavior and make your code "work by design" instead of by accident. – legalize Dec 10 '15 at 21:25
  • 1
    @legalize: Of course we're supposed to do everything possible to avoid UB. But @fish2000 asked what exactly could happen. `const` doesn't say that the data can't be altered at all. It say that you're not supplied with a proper interface to alter these data in this particular scope. But there are different scopes and compiler can't track them all, in general. – Minor Threat Dec 10 '15 at 21:56
  • 1
    @MinorThreat Because we're entering the realm of undefined behavior, the answer of what *could* happen is literally anything the compiler feels like doing, including [invoking `hack`, or the towers of hanoi game](http://feross.org/gcc-ownage/). – legalize Dec 10 '15 at 22:23
  • 1
    @legalize: I saw some cases when Microsoft did rely on UB. ATL/WTL thunk code on the stack, type conversions via `union`s, `void(*)(LPRECT)` -> void(*)(const CRect&), etc etc. – Minor Threat Dec 10 '15 at 22:43