21

Is there anyway to read a known number of bytes, directly into an std::string, without creating a temporary buffer to do so?

eg currently I can do it by

boost::uint16_t len;
is.read((char*)&len, 2);
char *tmpStr = new char[len];
is.read(tmpStr, len);
std::string str(tmpStr, len);
delete[] tmpStr;
Fire Lancer
  • 29,364
  • 31
  • 116
  • 182
  • Have you thought about using a `vector` instead of `string`? If your data is more "raw" than "string-like", it may work better for you, and there's less confusion over direct access. (Vectors are required to store contiguously, so use `&v[0]`.) –  Nov 29 '09 at 19:14
  • For the most part it is string data, just embedded within large binary files. Also I only want to change the loading routines, not the 1000's of lines of code that then use the data once loaded, which a change from std::string would require. – Fire Lancer Nov 29 '09 at 19:18
  • Then I'd check your specific string implementation, and then use GMan's answer, make sure you check the stream after `is.read` too. –  Nov 29 '09 at 19:21
  • I see this is old but I just have to comment for future readers: please don't trust the bytes in a stream for length values since they can be corrupt or be manipulated by malicious users. In this case 64KB is probably not too awful but you never know. I think some of the GIF and JPG code execution bugs were made possible by code like this. – Zan Lynx Dec 17 '13 at 22:21
  • You have any example of such an execution bug? Even in the case of say a 64bit value and I forgot to put in a max size restriction, in cases like this its newly allocated memory, making the worst case an allocation failure? – Fire Lancer Jan 20 '14 at 01:19
  • 1
    Sure. Make a stream with a 4 byte length field for a string. Use an `int`. Read the 4 bytes in from the stream, add 1 and allocate that much for the string plus the null terminator. What if the evil stream used `0xFFFF` for length? That's read as -1. You add 1 to get 0, malloc 0, and things just go downhill from there, depending on details. – Zan Lynx Jan 20 '14 at 03:04

6 Answers6

11

std::string has a resize function you could use, or a constructor that'll do the same:

boost::uint16_t len;
is.read((char*)&len, 2);

std::string str(len, '\0');
is.read(&str[0], len);

This is untested, and I don't know if strings are mandated to have contiguous storage.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • Strings are defined to be vectors. Same contiguity. – bmargulies Nov 29 '09 at 19:16
  • 6
    They are not defined to be vectors, but 21.3.4/1 does imply contiguous storage. However there's confusion and defect reports about that specific section, and I'm not sure what the current consensus is, nor how portable depending on that interpretation is. –  Nov 29 '09 at 19:17
  • 2
    @Roger. I disagree that 21.3.4/1 implies contiguous storage. It is the presence of c_str() and data() that imply it, but only because an efficient implementation would require contiguous storage to implement them. I believe the next version of the standard also disambiguate the situation. – Martin York Nov 29 '09 at 23:23
  • 1
    It's a known defect in the standard that will be corrected in C++0x. I do not know of any implementations that do not use contiguous storage though. You could always put in an assertion to check that it's contiguous though. – JoeG Jan 12 '10 at 12:19
  • 4
    In C++11, 21.4.1/5 it says "The char-like objects in a basic_string object shall be stored contiguously". – pascal Sep 25 '14 at 15:50
  • But, here since we are adding data to a string without using its intended methods, won't string.length() return wrong data? – ihsan Dec 12 '22 at 10:23
  • That's taken care of by [`basic_string::resize(count)`](https://en.cppreference.com/w/cpp/string/basic_string/resize) and [`basic_string::basic_string(count, ch, alloc)`](https://en.cppreference.com/w/cpp/string/basic_string/basic_string), @ihsan. Using one of these to set the string's size will both reserve space for at least `count` characters, and set the string's length to `count` characters, effectively creating an internal buffer guaranteed to be large enough to hold `count` characters. – Justin Time - Reinstate Monica Jun 26 '23 at 23:23
6

You could use a combination of copy_n and an insert_iterator

void test_1816319()
{
    static char const* fname = "test_1816319.bin";
    std::ofstream ofs(fname, std::ios::binary);
    ofs.write("\x2\x0", 2);
    ofs.write("ab", 2);
    ofs.close();

    std::ifstream ifs(fname, std::ios::binary);
    std::string s;
    size_t n = 0;
    ifs.read((char*)&n, 2);
    std::istream_iterator<char> isi(ifs), isiend;
    std::copy_n(isi, n, std::insert_iterator<std::string>(s, s.begin()));
    ifs.close();
    _unlink(fname);

    std::cout << s << std::endl;
}

no copying, no hacks, no possibility of overrun, no undefined behaviour.

dex black
  • 637
  • 6
  • 8
  • If you're doing what I think you're doing then go read this [link](http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/index.html) and the code that goes with it. – dex black Feb 28 '11 at 11:37
  • Not the case here, but is the `copy_n` safe if the end of the file or an error encountered? – Liviu Dec 26 '13 at 17:15
  • I built some code using your method: [code review](http://codereview.stackexchange.com/questions/38148/updating-a-file-through-c-streams). Thanks! – Liviu Dec 27 '13 at 13:12
2

You could use something like getline:

#include <iostream>
#include <string>
using namespace std;

int main () {
  string str;
  getline (cin,str,' ');
}
rmn
  • 2,386
  • 1
  • 14
  • 21
  • 1
    This is a good suggestion for other problems, but not for this one: unformatted input of a specific number of bytes. –  Nov 29 '09 at 19:09
  • This doesn't answer the question as it doesn't read a specified number of bytes. Even if it did, getline has to look at every byte it reads for the delimiter which is expensive and unnecessary when the number of bytes is specified. This answer should be removed. – xaxxon Oct 15 '17 at 08:15
2

I would use a vector as the buffer.

boost::uint16_t len;
is.read((char*)&len, 2); // Note if this file was saved from a different architecture 
                         // then endianness of these two bytes may be reversed.

std::vector buffer(len);  // uninitialized.
is.read(&buffer[0], len);

std::string  str(buffer.begin(),buffer.end());

Though you will probably get away with using a string as the buffer (as described by GMan). It is not guaranteed by the standard that a strings members are in consecutive locations (so check your current implementation and put a big comment that it needs checking when porting to another compiler/platform).

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • " It is not guaranteed by the standard that a strings members are in consecutive locations" <== it apparently has been since '11 – xaxxon Oct 15 '17 at 08:17
  • @xaxxon: True. But the code above does not require string to store elements in consecutive locations. Now if you are referring to vector (and just mentioned string accidentally) this code does make that assumption. But as you noted since C++11 this has been guaranteed. Also before the C++ standard was updated in 2011 there was a survey of all major implementations (around 2007) and all of them implemented vectors as contiguous blocks (which made updating the standard easy). – Martin York Oct 16 '17 at 18:24
0

An easy way would be:

std::istream& data
const size_t dataSize(static_cast<size_t>(data.rdbuf()->in_avail()));
std::string content;
content.reserve( dataSize);
data.read(&content[0], dataSize);
0

Are you just optimizing code length or trying to save yourself a copy here? What's wrong with the temporary buffer?

I'd argue that you're actually circumventing the protections of the string trying to write directly do it like that. If you're worried about performance of the copy to a std::string because you've identified that it's in some way affecting the performance of your application, I'd work directly with the char*.

EDIT: Doing more looking... initializing std::string from char* without copy

In the second answer, it's stated pretty flatly that you can't achieve what you're looking to achieve (ie. populate a std::string without an iteration over the char* to copy.)

Take a look at your load routine (post it here perhaps?) and minimize allocations: new and delete certainly aren't free so you can at least save some time if you don't have to re-create the buffer constantly. I always find it helpful erase it by memset'ing the buffer to 0 or null terminating the first index of the array each iteration but you may quickly eliminate that code in the interests of performance once you're confident in your algorithm.

Community
  • 1
  • 1
antik
  • 5,282
  • 1
  • 34
  • 47
  • The performance of std::string is fine, the problem is loading the data into them from a binary file, which is currently taking an unacceptably long period of time. Profiling showed that 70% of that load time is reading strings, with just 30% being other binary data or small bits of processing, so speeding up the string reading seems the obvious solution to speeding the whole thing up by a major margin. So I by no means want to replace std::string in the rest of the program which would mean changing 1000's of lines, rather than just changing the string loading routine. – Fire Lancer Nov 29 '09 at 19:13
  • How big an issue is the alloc, dealloc of the char* every iteration? What if you simply kept a char* of sufficient size (checking for each iteration, obviously) around and just created new strings from that single char*? – antik Nov 29 '09 at 19:31