0

I'm bit puzzled by how modifying a std::string beyond its size is handled? In an example I tried, it allowed me to modify the string beyond its size using op[] (and I'm aware that standard doesn't stop you from doing it). However, when I print the string using cout it prints the original string but when I print whats returned by cstr (), it prints the modified version. How does it keep track of both sizes (3 & 5)?.

#include <string>
#include <iostream>

using namespace std;

int main(void) {
    std::string a = "abc";
    cout << "str before     : " << a << endl;
    const char * charPtr = a.c_str ();
    cout << "c_str before   : " << charPtr << endl;
    cout << "str size / capacity : " << a.size () << ", " << a.capacity () << endl;
    a[3] = 'd';
    a[4] = 'e';
    cout << "str after      : " << a << endl;
    const char * charPtr2 = a.c_str ();
    cout << "c_str after    : " << charPtr2 << endl;
    cout << "str size / capacity : " << a.size () << ", " << a.capacity () << endl;
    return 0;
}

output :
str before : abc
c_str before : abc
str size / capacity : 3, 3
str after : abc
c_str after : abcde
str size / capacity : 3, 3

Community
  • 1
  • 1
Jitu
  • 117
  • 1
  • 1
  • 7

1 Answers1

1

Although you already got a correct comment saying the behaviour is undefined, there is something worthy of an actual answer too.

A C++ string object can contain any sequence of characters you like. A C-style string is terminated by the first '\0'. Consequently, a C++ string object must store the size somewhere other than by searching for the '\0': it may contain embedded '\0' characters.

#include <string>
#include <iostream>

int main() {
  std::string s = "abc";
  s += '\0';
  s += "def";
  std::cout << s << std::endl;
  std::cout << s.c_str() << std::endl;
}

Running this, and piping the output through cat -v to make control characters visible, I see:

abc^@def
abc

This explains what you're seeing: you're overwriting the '\0' terminator, but you're not overwriting the size, which is stored separately.

As pointed out by kec, you might have seen garbage except you were lucky enough to have an additional zero byte after your extra characters.

  • 1
    Probably worth also pointing out that he/she by luck miight be exploiting some unused capacity, and that also by luck, that unused memory had some 0 bytes in it that by luck was in the right place to terminate the invalidly modified internal buffer. – kec May 04 '14 at 15:43
  • Thanks, added a note (somewhat less extensive than your comment, but you are entirely correct) –  May 04 '14 at 15:52
  • I think kec accurately pointed out that the unused memory location probably already had \0 in them which resulted in c_str () (by luck) terminating string where it did.. – Jitu May 04 '14 at 15:59