151

For C, we use char[] to represent strings.

For C++, I see examples using both std::string and char arrays.

#include <iostream>
#include <string>
using namespace std;

int main () {
  string name;

  cout << "What's your name? ";
  getline(cin, name);
  cout << "Hello " << name << ".\n";

  return 0;
}
#include <iostream>
using namespace std;

int main () {
  char name[256];

  cout << "What's your name? ";
  cin.getline(name, 256);
  cout << "Hello " << name << ".\n";

  return 0;
}

(Both examples adapted from http://www.cplusplus.com.)

What is the difference between these two types in C++? (In terms of performance, API integration, pros/cons, ...)

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
ramosg
  • 2,046
  • 2
  • 15
  • 14

7 Answers7

221

A char array is just that - an array of characters:

  • If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
  • If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
  • If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
  • To determine the text's length, the array has to be scanned, character by character, for a \0 character.

A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.

You can access a string's char array like this:

std::string myString = "Hello World";
const char *myStringChars = myString.c_str();

C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.


However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.

Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.

In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
Cygon
  • 9,444
  • 8
  • 42
  • 50
  • 10
    In addition, strings have a bunch of helper-functions that can be really neat. – Håkon Aug 17 '09 at 11:18
  • 1
    I don't believe the bit about DLL boundries. Under very special curcumstances it could potentially break ((one DLL is statically linking against a different version of the runtime than used by other DLL's) and worse things would probably happen first in these situations) but in the general case where everybody is using the default shared version of the standard runtime (the default) this will not happen. – Martin York Aug 17 '09 at 18:00
  • 2
    Example: You distribute VC2008SP1-compiled binaries of a public library called libfoo, which has an std::string & in its public API. Now Someone downloads your libfoo.dll and does a debug build. His std::string could very well have some additional debug fields in it, causing the offset of the pointer for dynamic strings to move. – Cygon Aug 21 '09 at 20:25
  • 2
    Example 2: In 2010, someone downloads your libfoo.dll and uses it in his VC2010-built application. His code loads MSVCP100.dll and your libfoo.dll still loads MSVCP90.dll -> you get two heaps -> memory cannot be freed, assertion errors in debug mode if libfoo modifies the string reference and hands an std::string with a new pointer back. – Cygon Aug 21 '09 at 20:27
  • 3
    I'm just going to stick with "In short: use C++ strings in all your internal functions and methods." Trying to understand your Examples maid my brain pop. – Stephen Oct 29 '10 at 15:51
  • Fragmentation is unavoidable, be it small string or big once. Besides 16 characters length getting from heap would not allocate only 16 byte out of heap. There is also heap boundary (of few bytes) to be considered for each allocation. More over heap allocation for even a byte is done by allocating 32 bytes (based on processor) to follow the segmentation. – Mithun B Jan 23 '19 at 09:02
  • 1
    Those (implementation-dependent) 16 characters are built into the string class, not allocated on the heap. So a *short* string inside a method doesn't allocate memory at all, and a *short* string as part of a heap-allocated object will be embedded inside that object's memory, not allocate its own buffer on the heap. This is called "small buffer optimization." – Cygon Feb 10 '19 at 09:43
20

Arkaitz is correct that string is a managed type. What this means for you is that you never have to worry about how long the string is, nor do you have to worry about freeing or reallocating the memory of the string.

On the other hand, the char[] notation in the case above has restricted the character buffer to exactly 256 characters. If you tried to write more than 256 characters into that buffer, at best you will overwrite other memory that your program "owns". At worst, you will try to overwrite memory that you do not own, and your OS will kill your program on the spot.

Bottom line? Strings are a lot more programmer friendly, char[]s are a lot more efficient for the computer.

Mark Rushakoff
  • 249,864
  • 45
  • 407
  • 398
  • 7
    At worst, other people will overwrite memory and run malicious code on your computer. See also [buffer overflow](http://cwe.mitre.org/data/definitions/120.html). – David Johnstone Mar 18 '10 at 01:12
7

Well, string type is a completely managed class for character strings, while char[] is still what it was in C, a byte array representing a character string for you.

In terms of API and standard library everything is implemented in terms of strings and not char[], but there are still lots of functions from the libc that receive char[] so you may need to use it for those, apart from that I would always use std::string.

In terms of efficiency of course a raw buffer of unmanaged memory will almost always be faster for lots of things, but take in account comparing strings for example, std::string has always the size to check it first, while with char[] you need to compare character by character.

Arkaitz Jimenez
  • 22,500
  • 11
  • 75
  • 105
5

I personally do not see any reason why one would like to use char* or char[] except for compatibility with old code. std::string's no slower than using a c-string, except that it will handle re-allocation for you. You can set it's size when you create it, and thus avoid re-allocation if you want. It's indexing operator ([]) provides constant time access (and is in every sense of the word the exact same thing as using a c-string indexer). Using the at method gives you bounds checked safety as well, something you don't get with c-strings, unless you write it. Your compiler will most often optimize out the indexer use in release mode. It is easy to mess around with c-strings; things such as delete vs delete[], exception safety, even how to reallocate a c-string.

And when you have to deal with advanced concepts like having COW strings, and non-COW for MT etc, you will need std::string.

If you are worried about copies, as long as you use references, and const references wherever you can, you will not have any overhead due to copies, and it's the same thing as you would be doing with the c-string.

Abhay
  • 7,092
  • 3
  • 36
  • 50
  • +1 Though you did not consider implementation issues like DLL compatibility, u got COW. –  Aug 17 '09 at 12:35
  • what about I know that my char array in 12 bytes? If I instantiate a string for that it might not be really efficient right? – David 天宇 Wong Aug 16 '16 at 00:17
  • @David: If you have extremely perf sensitive code then yes. You might consider std::string ctor call as an overhead in addition to initialization of std::string members. But remember premature optimization has made a lot of code bases unnecessarily C-styled, so be careful. – Abhay Aug 17 '16 at 14:17
2

One of the difference is Null termination (\0).

In C and C++, char* or char[] will take a pointer to a single char as a parameter and will track along the memory until a 0 memory value is reached (often called the null terminator).

C++ strings can contain embedded \0 characters, know their length without counting.

#include<stdio.h>
#include<string.h>
#include<iostream>

using namespace std;

void NullTerminatedString(string str){
   int NUll_term = 3;
   str[NUll_term] = '\0';       // specific character is kept as NULL in string
   cout << str << endl <<endl <<endl;
}

void NullTerminatedChar(char *str){
   int NUll_term = 3;
   str[NUll_term] = 0;     // from specific, all the character are removed 
   cout << str << endl;
}

int main(){
  string str = "Feels Happy";
  printf("string = %s\n", str.c_str());
  printf("strlen = %d\n", strlen(str.c_str()));  
  printf("size = %d\n", str.size());  
  printf("sizeof = %d\n", sizeof(str)); // sizeof std::string class  and compiler dependent
  NullTerminatedString(str);


  char str1[12] = "Feels Happy";
  printf("char[] = %s\n", str1);
  printf("strlen = %d\n", strlen(str1));
  printf("sizeof = %d\n", sizeof(str1));    // sizeof char array
  NullTerminatedChar(str1);
  return 0;
}

Output:

strlen = 11
size = 11
sizeof = 32  
Fee s Happy


strlen = 11
sizeof = 12
Fee
Eswaran Pandi
  • 602
  • 6
  • 10
  • "from specific, all the character are removed" no, they're not "removed", printing a char pointer only prints upto the null terminator. (since that's the only way a char* knows the end) the string class knows the full size itself so it just uses that. if you know the size of your char*, you could print/use all the chars yourself too. – Puddle Jan 27 '19 at 02:21
1

Think of (char *) as string.begin(). The essential difference is that (char *) is an iterator and std::string is a container. If you stick to basic strings a (char *) will give you what std::string::iterator does. You could use (char *) when you want the benefit of an iterator and also compatibility with C, but that's the exception and not the rule. As always, be careful of iterator invalidation. When people say (char *) isn't safe this is what they mean. It's as safe as any other C++ iterator.

Samuel Danielson
  • 5,231
  • 3
  • 35
  • 37
1

Strings have helper functions and manage char arrays automatically. You can concatenate strings, for a char array you would need to copy it to a new array, strings can change their length at runtime. A char array is harder to manage than a string and certain functions may only accept a string as input, requiring you to convert the array to a string. It's better to use strings, they were made so that you don't have to use arrays. If arrays were objectively better we wouldn't have strings.