10

§21.4.5 [string.access]

const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

The second part implies, to me atleast, that this "object of type charT" may reside outside of the sequence stored in the std::string object. Example implementation of a conforming operator[]:

reference operator[](size_type pos){
  static contexpr charT default = charT();
  if(pos == size())
    return default;
  return buf[pos];
}

Now, c_str()/data(), are specified in terms of operator[]:

§21.4.7 [string.accessors]

const charT* c_str() const noexcept;
const charT* data() const noexcept;

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

This would make the above operator[] implementation non-conformant, as p + size() != &operator[](size()). However, with a bit of trickery, you can circumvent this problem:

reference operator[](size_type pos){
  static contexpr charT default = charT();
  if(pos == size() && !evil_context) // assume 'volatile bool evil_context;'
    return default;
  return buf[pos];
}

struct evil_context_guard{
  volatile bool& ctx;
  evil_context_guard(volatile bool& b)
    : ctx(b) {}
  ~evil_context_guard(){ b = false; }
};

const charT* c_str() const noexcept{
  evil_context_guard g(evil_context = true);
  // now, during the call to 'c_str()', the requirement above holds
  // 'p + i == &operator[](i) for each i in [0,size()]'
  const charT* p = &buf[0];
  assert(p+size() == &operator[](size()));
  return p;
}

Now, the obvious question is...

Is the above code really conformant or did I overlook something?

Xeo
  • 129,499
  • 52
  • 291
  • 397
  • 1
    One thing I notice is that if you were actually write the check out for a string object `str`: `char* p = str.c_str(); size_t i = str.size(); assert(p + i == &str[i]);` the assertion will fail with your code. The standard doesn't seem to specify a specific context where the invariant must hold, so I'd be careful about assuming it only needs to hold before `c_str()` returns. – pmdj Aug 04 '12 at 15:49

2 Answers2

4

Ignoring the given code, considering only the question, I think that

  • unfortunately, the answer seems to be “yes”, and
  • that is certainly not the intent of the standard.

Hence, it appears to be a defect.

Checking the list of known library defects apparently this issue has not yet been reported.

So, as I stated in chat, I recommend posting it to [comp.std.c++], in order to get resolved the question of whether it really is a defect, and if so, to get it into the defects list and fixed.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • 5
    I have a vague suspicion that this is intentional. This wording allows for empty strings to be represented using a single `static char` somewhere (without this wording, the null byte would have to be unique to every string, which would require either (1) empty strings being represented using an in-object null-byte, or (2) requiring a dynamically allocated buffer). I'm not sure why this would be preferable over (1), but it seems an odd coincidence that this wording makes it possible, so I suspect it's intentional – jalf Aug 04 '12 at 16:37
  • @jalf: SSO is all about have small strings without dynamic memory allocation. It would be perfect for this null byte so I don't see how it would be an issue. Would you mind explaining ? – Matthieu M. Aug 04 '12 at 17:27
  • @ildjarn: No, obviously not, but it does demonstrate that there is no reason for null string not to be able to allocate an in-class null character. – Matthieu M. Aug 05 '12 at 09:36
  • 3
    @Matthieu : It also doesn't disprove that the standard may be allowing for implementations that don't provide SSO and would benefit from a static sentinel value, so I'm not sure what your point is. – ildjarn Aug 05 '12 at 10:19
0

I don't see how it could be conformant. User code can never observe the promised returned value. The assert in the code is misleading because it is in the wrong place: the function has not returned yet. Returns: requirements apply to be value returned from the function, not to some value within its implementation (it should be obvious why that is a nonsensical idea).

The assertion should be here:

auto p = s.c_str();
assert(p + s.size() == &operator[](s.size()));

I believe the wording that treats s[s.size()] specially is simply meant to forbid you from blowing up the null terminator.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
  • To stop me from blowing up the null-terminator, they could've just said "don't modify the value referenced by `s[s.size()]`" and wouldn't need to allow this specific value to exist outside of the sequence. – Xeo Aug 05 '12 at 20:55