37

The book C++ Primer says

For most applications, in addition to being safer, it is also more efficient to use library strings rather then C-style strings

Safety is understood. Why is the C++ strings library more efficient? After all, underneath it all, aren't strings still represented as character arrays?

To clarify, does the author talk about programmer efficiency (understood) or processing efficiency?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
James Leonard
  • 3,593
  • 5
  • 27
  • 32

8 Answers8

29

C-strings are usually faster, because they do not call malloc/new. But there are cases where std::string is faster. Function strlen() is O(N), but std::string::size() is O(1).

Also when you search for substring, in C strings you need to check for '\0' on every cycle, in std::string - you don't. In a naive substring search algorithm it doesn't matter much, because instead of checking for '\0' you need to check for i<s.size(). But modern high-performance substring search algorithms traverse strings in multibyte steps. And the need for a '\0' check in every byte slows them down. This is the reason why GLIBC memmem is x2 times faster than strstr. I did a lot of benchmarking of substring algorithms.

This is true not only for substring search algorithm. Many other string processing algorithms are slower for zero-terminated strings.

Leonid Volnitsky
  • 8,854
  • 5
  • 38
  • 53
  • 21
    In which way do C strings not need to call `malloc/new`? Whenever you want a dynamically sized string you need dynamically allocated memory, this holds for C strings as well as `std::string`s. And besides that, I think searching for a substring is an algorithm where there shouldn't be a difference between C strings and `std::string`s. And again one of the slightly wrong 50% chosen, but well Ok, it's James's choice which answer to accept. – Christian Rau Aug 25 '12 at 18:13
  • 3
    @ChristianRau - you need to reread my post. Also did you had a chance to benchmark c-string and std::string substring functions? I did. – Leonid Volnitsky Aug 25 '12 at 18:18
  • 2
    No, I did no profiling, just the when thinking about searching for a substring I don't neccessarily see the need for computing the length each time, though I might be wrong there. Of yourse I know that the O(1) length is the main advanatage (that's what the other correct answers write, too). But even after rereading, I cannot see how C strings *"do not call malloc/new"*, which is just rubbish and my main critique point in your answer. – Christian Rau Aug 25 '12 at 18:22
  • 4
    You can put c-string (or almost any other type) into dynamically allocated memory , but you don't have to. If you dynamically allocate C-string - you will be doing this, not C-string's members functions. C-string (as int,double or any POD type) do not call itself malloc/new. – Leonid Volnitsky Aug 25 '12 at 18:43
  • 13
    Well, of course a pointer doesn't allocate itself. But c-strings do not **usually** *"not call malloc/new"*, since using a simple compile time array is not the *usual* way to work with strings, especially in light of any string processing algorithms. Of course you can always throw a `char[1024]` at it and call it a day, but well. But you're right in that in some cases a small char array is ineed appropriate and doesn't require any dynamic allocation, but I would call those cases far from usual, especially when doing elaborate string processing. – Christian Rau Aug 25 '12 at 18:57
  • 1
    While it is true that with c-strings you need to check for null-terminating character while looping the char array (`while(c != '\0')`), this is not any slower than std::string checking its own boundary condition (`while(i < size)`). – Gigi Aug 25 '12 at 19:56
  • 1
    @Gigi - this might be true only for naive algorithm (but really isn't) which compare a byte at a time. Modern algorithm use multibyte steps. You can benchmark GLIBC `memmem` and `strstr` - `memmem` will be two times faster. – Leonid Volnitsky Aug 25 '12 at 20:06
  • @Gigi it is because in most implementations as the std:string object often maintains a size integer field. Of course this is a memory/performance tradeoff, store an extra 32bits or loop through the whole string. O(n) is close enough to O(1) that it can be mostly ignored. – ewanm89 Aug 25 '12 at 21:06
  • Ok, this answer gets better and better. I now realize what you meant with your *"check for '\0' every cycle"* statement and it makes sense (but the downvote is tied to the *"usually"* in the first sentence, anyway). – Christian Rau Aug 26 '12 at 08:47
  • @Gigi He claims to have benchmarked it, finding `std::string` is faster, and his reason makes sense. `std::string` can operate over many bytes at once with a single instruction. C-style strings must check that a character is not `'\0'` to avoid trying to read memory that the C-string doesn't use or, worse, to avoid reading memory that doesn't exist at all. – user904963 Dec 22 '21 at 23:51
24

Why is C++ strings library more efficient? After all, underneath it all, aren't strings still represented as character arrays?

Because the code which uses char* or char[] is more likely to be inefficent if not written carefully. For example, have you seen loop like this:

char *get_data();

char const *s = get_data(); 

for(size_t i = 0 ; i < strlen(s) ; ++i) //Is it efficent loop? No.
{
   //do something
}

Is that efficient? No. The time-complexity of strlen() is O(N), and furthermore, it is computed in each iteration, in the above code.

Now you may say "I can make it efficient if I call strlen() just once.". Of course, you can. But you have to do all that sort of optimization yourself and conciously. If you missed something, you missed CPU cycles. But with std::string, many such optimization is done by the class itself. So you can write this:

std::string get_data();

std::string const & s = get_data(); //avoid copy if you don't need  it

for(size_t i = 0 ; i < s.size() ; ++i) //Is it efficent loop? Yes.
{
   //do something
}

Is that efficient? Yes. The time-complexity of size() is O(1). No need to optimize it manually which often makes code look ugly and hard to read. The resulting code with std::string is almost always neat and clean in comparison to char*.

Also note that std::string not only makes your code efficent in terms of CPU cycles, but it also increases programmer efficency!

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 5
    The usual way to iterate over a C string is: `for (int i = 0; s[i]; ++i)`, which is efficient. The argument should be about the string class storing the size of the string, making it available in O(1). – Daniel Fleischman Aug 25 '12 at 18:51
  • 3
    @DanielFleischman: You misread my answer. I gave that example by first asking *"have you seen loop like this?"*, and then added `O(N) vs O(1)`, explaining other pointers in the rest of the answer. (So I dont think the downvote is justified). – Nawaz Aug 25 '12 at 18:53
  • 4
    @Nawaz: In gcc's implementation of the C standard library, `strlen` is tagged with an attribute that instructs the compiler that the result of the function call depends only on the arguments (no side effects other than the returned value). This allows the compiler to factor out the `strlen` from the loop. That being said, that is *quality of implementation*, and that optimization is not available on other compiler/library implementations. – David Rodríguez - dribeas Aug 25 '12 at 20:35
  • 3
    @DavidRodríguez-dribeas this depends on the quality of the compiler, not something one should rely on for such optimization – ratchet freak Aug 26 '12 at 00:50
  • 1
    @JimBalter: *"No one does this."* ... You're underestimating the number of programmers who call `strlen(s)` **multiple** times in a *single* function!!! – Nawaz Dec 02 '17 at 14:34
  • @JimBalter: That is *not* what you previously said. You said *"No one does this"* and now you're calling yourself *"honest"*. ;-) – Nawaz Dec 03 '17 at 05:09
  • @JimBalter: and I said *"the code which uses `char*` or `char[]` is **more likely** to be inefficent **if not written carefully**"* . What is wrong in that when you admit it yourself that a billion programmers call `strlen(s)` multiple times in a single function, possibly in loops as well? – Nawaz Dec 03 '17 at 05:13
  • @JimBalter: Goodbye *"No one does this"*. ;-) – Nawaz Dec 03 '17 at 08:06
  • 1
    @Jim Balter: In .NET ([C#](https://en.wikipedia.org/wiki/C_Sharp_%28programming_language%29) and [VB.NET](https://en.wikipedia.org/wiki/Visual_Basic_.NET)), the *official recommended* way is to call length() in the loop condition. There may or may not be some guarantee that it is efficient. Someone going from C# to C may or may not make the mistake. – Peter Mortensen Feb 06 '23 at 01:36
  • 1
    Here is [an example](https://stackoverflow.com/questions/17663186/initializing-a-two-dimensional-stdvector/59906780#59906780) from C++, using size() in the loop conditions. It may or may not be efficient. – Peter Mortensen Feb 06 '23 at 02:27
  • length() and size() aren't relevant here since they are O(1) and compile to a single fetch whereas strlen() is O(n). This is actually mentioned in the answer, so it's hard to see the point of these comments. Also this conversation is past its expiration date. – Jim Balter Feb 07 '23 at 04:47
9

A std::string knows its length, which makes many operations quicker.

For example, given:

const char* c1 = "Hello, world!";
const char* c2 = "Hello, world plus dog!";
std::string s1 = c1;
std::string s2 = c2;

strlen(c1) is slower than s1.length(). For comparisons, strcmp(c1, c2) has to compare several characters to determine the strings are not equal, but s1 == s2 can tell the lengths are not the same and return false immediately.

Other operations also benefit from knowing the length in advance, e.g. strcat(buf, c1) has to find the null terminator in buf to find where to append data, but s1 += s2 knows the length of s1 already and can append the new characters at the right place immediately.

When it comes to memory management, std::string allocates additional space every time it grows, which means future append operations don't need to reallocate.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
7

There are some cases in which a std::string might beat a char[]. For example, C-style strings typically don't have an explicit length passed around—instead, the NUL terminator implicitly defines the length.

This means that a loop which continually strcats onto a char[] is actually performing O(n²) work, because each strcat has to process the entire string in order to determine the insertion point. In contrast, the only work that a std::string needs to perform to concatenate onto the end of a string is to copy the new characters (and possibly reallocate storage—but for the comparison to be fair, you have to know the maximum size beforehand and reserve() it).

John Calsbeek
  • 35,947
  • 7
  • 94
  • 101
  • I'm sure most C developers are wise enough to store the sizes themselves. It'll be about the same speed, but it will also be more error prone and harder to understand due to there being more code - more variables, more function arguments, etc. – user904963 Dec 23 '21 at 00:11
  • Yep. strlen() was famously used as an example in illustrating the [Shlemiel the painter’s algorithm](https://www.joelonsoftware.com/2001/12/11/back-to-basics/) (it comes in many forms and is very easy to "apply" inadvertently) – Peter Mortensen Feb 06 '23 at 01:25
3

Strings are the object which contain character arrays within themselves along with their size and other functionalities. It is better to use strings from a strings library because they save you from allocating and deallocating memory, looking out for memory leaks and other pointer hazards. But as strings are objects, so they take extra space in memory.

C strings are simply character arrays. They should be used when you are working in real time; when you do not know completely about how much memory space you have in hand. If you are using C strings, you would have to take care for memory allocation, then copying data into it via strcpy or character by character, then deallocating after its usage, etc., etc.

So better use strings from a string library if you want to avoid a bunch of headaches.

Strings increase program efficiency but reduce processing efficiency (though not necessarily). Vice versa is with C strings.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Coding Mash
  • 3,338
  • 5
  • 24
  • 45
3

Well, an obvious and simple thing how they could be practically more efficient (regarding runtime) is, that they store the string's length along with the data (or at least their size method has to be O(1), which says practically the same).

So whenever you would need to find the NUL character in a C string (and thus walk the whole string once) you can just get the size in constant time. And this happens quite a lot, e.g. when copying or concatenating strings and thus allocating a new one beforehand, whose size you need to know.

But I don't know if this is what the author meant or if it makes a huge difference in practice, but it still is a valid point.

Christian Rau
  • 45,360
  • 10
  • 108
  • 185
1

Here is a short point of view.

First of all, C++ strings are objects, so it is more consistent to use them in an object-oriented language.

Then, the standard library comes with a lot of useful functions for strings, iterators, etc. All this stuff is stuff you won't have to code again, so you gain time and you're sure that this code is (almost) bugless.

Finally, C strings are pointers that are kind of difficult to understand when you're new, and they bring complexity. Since references are preferred over pointers in C++, it makes more sense to use std::string instead of C strings.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Intrepidd
  • 19,772
  • 6
  • 55
  • 63
  • 3
    C++ is not an object oriented language. – Jonathan Wakely Aug 25 '12 at 17:58
  • 8
    @JonathanWakely It is not *only* an object-oriented language. However, it is definitely possible to perform object-oriented programming in it, which in my opinion is the only reasonable definition of "object-oriented language." – John Calsbeek Aug 25 '12 at 17:59
  • 2
    Object oriented programming IS the basic difference in C and C++. Originally C++ was named 'C with classes'. – Coding Mash Aug 25 '12 at 18:06
  • 3
    @CodingMash, there are many other differences, such as templates, which have nothing to do with OO. C++ is a multi-paradigm language, not an OO language. You can write OO code in C (with enough effort) but that doesn't make it an "OO language" – Jonathan Wakely Aug 25 '12 at 18:08
  • 2
    I just said the basic difference. you are right there are many others as templates, namespaces, etc, etc. I was merely commenting on that 'c++ is not and object oriented language'. It may not be PURELY oop language, its has the procedural paradigm as well (no offence meant) – Coding Mash Aug 25 '12 at 18:13
1

The difficulty with C-style strings is that one really can't do much with them unless one knows about the data structures in which they are contained. For example, when using "strcpy", one must know that the destination buffer is writable, and has enough space to accommodate everything up to the first zero byte in the source (of course, in all too many cases, one doesn't really know that for certain...). Very few library routines provide any support for allocating space on demand, and I think all those that do work by allocating it unconditionally (so if one had a buffer with space for 1000 bytes, and one wants to copy a 900-byte string, code using those methods would have to abandon the 1000-byte buffer and then create a new 900-byte buffer, even though it might be better to simply reuse the 1000-byte buffer).

Working with an object-oriented string type would in many cases not be as efficient as working with standard C-strings but figuring out the optimal ways to allocate and reuse things. On the other hand, code which is written to optimally allocate and reuse strings may be very brittle, and slight changes to requirements could require making lots of tricky little tweaks to the code--failing to tweak the code perfectly would likely result in bugs which may be obvious and severe, or subtle but even more severe. The most practical way to avoid brittleness in code which uses standard C strings is to design it very conservatively. Document maximum input-data sizes, truncate anything which is too big, and use big buffers for everything. Workable, but not terribly efficient.

By contrast, if one uses the object-oriented string types, the allocation patterns they use will likely not be optimal, but will likely be better than the 'allocate everything big' approach. They thus combine much of the run-time efficiency of the hand-optimized-code approach with safety that's better than the 'allocate everything big' approach.

supercat
  • 77,689
  • 9
  • 166
  • 211