8

I have a function that constructs a std::string from a const char* with two numbers, passed as parameters, appended to the end of it.

std::string makeName(const char* name, uint16_t num1, uint16_t num2) {

    std::string new_name(name);
    new_name.reserve(new_name.length()+5);

    new_name += ":";
    new_name += boost::lexical_cast<std::string>(num1);
    new_name += ":";
    new_name += boost::lexical_cast<std::string>(num2);

    return new_name;
}

This function gets called thousands of times to create unique names for small objects allocated on the heap.

Object* object1= new Object(makeName("Object", i, j)); // i and j are simply loop indices

I have discovered using valgrind's massif tool that the calls to makeName allocates a lot of memory since it gets called so many times.

87.96% (1,628,746,377B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

->29.61% (548,226,178B) 0xAE383B7: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)

| ->26.39% (488,635,166B) 0xAE38F79: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)

| | ->26.39% (488,633,246B) 0xAE39012: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)

| | | ->15.51% (287,292,096B) 0x119A80FD: makeName(char const*, unsigned short, unsigned short) (Object.cpp:110)

| | | | ->15.51% (287,292,096B) in 42 places, all below massif's threshold (01.00%)

My question is, how can I minimize these allocations to help reduce the overall total amount of memory my program uses?

EDIT: I also want to note that as a program requirement I cannot use c++11 features.

burtmacklin16
  • 715
  • 2
  • 13
  • 31
  • 2
    Try using a stringstream. `sstream ss; ss << name << " : " << num1 << " : " << num2; return ss.str();` – NathanOliver Feb 05 '15 at 15:17
  • Does the "name" have to be a `std::string` ? If your number of objects is within numeric limits, how about assigning a unique `int` to it? – a_pradhan Feb 05 '15 at 15:18
  • @NathanOliver A good first step: but if the code in question is a serious bottleneck, going to `stringstream` isn't the answer. – Yakk - Adam Nevraumont Feb 05 '15 at 15:21
  • @akashPradhan - that is a good suggestion, however the object's name is typically going to be something other than "Object" - it is going to be a descriptive name of the object – burtmacklin16 Feb 05 '15 at 15:22
  • @NathanOliver - I previously used `std::stringstream` but upon some investigation found that it is doing more allocations than `std::string` would do. http://stackoverflow.com/questions/14741144/is-stdstringstream-better-to-accumulate-than-stdstring – burtmacklin16 Feb 05 '15 at 15:23
  • If you can live with potentially wasting some space: move name creation into the object, use a plain `char` array, and `sprintf` it. – molbdnilo Feb 05 '15 at 15:24
  • 1
    Oh, and how many objects are we talking about? The name `"Object:22:979"` will take up 14 bytes of memory, plus 12-24 bytes for the pointers to track it, and another 4-16 bytes of allocation overhead by the heap. If that is large compared to your object, and you have many objects... that is overhead. And if the objects are otherwise small, the high percentage might be because that is what you asked for? – Yakk - Adam Nevraumont Feb 05 '15 at 15:24
  • [boost.algorithim.join?](http://www.boost.org/doc/libs/1_57_0/doc/html/string_algo/reference.html#header.boost.algorithm.string.join_hpp) – Mgetz Feb 05 '15 at 15:25
  • @Yakk - there could potentially be 100,000 objects or more. At this point I am trying to minimize the overhead as much as possible. – burtmacklin16 Feb 05 '15 at 15:33
  • Thanks for the information on the string stream. Ill have to keep that in mind with my projects. – NathanOliver Feb 05 '15 at 15:33

3 Answers3

3

Only a DIY custom conversion beats using sprintf in such case.

So I would use sprintf and MEASURE.

Only if that wasn't good enough would I implement my own integer-to-string (knowing from numerous cases that it will certainly be somewhat faster, but not enough to justify starting with that).


Example. Instead of the current high level code

std::string makeName(const char* name, uint16_t num1, uint16_t num2) {

    std::string new_name(name);
    new_name.reserve(new_name.length()+5);

    new_name += ":";
    new_name += boost::lexical_cast<std::string>(num1);
    new_name += ":";
    new_name += boost::lexical_cast<std::string>(num2);

    return new_name;
}

just do

auto makeName( const char* const name, const uint16_t num1, const uint16_t num2 )
    -> std::string
{
    std::string result( strlen( name ) + 25, '\0' );    // 25 is large enough.
    int const n = sprintf( &result[0], "%s:%d:%d", name, num1, num2 );
    result.resize( n );
    return result;
}

Disclaimer: code not touched by compiler's hands.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • 2
    Depending on the compiler, [Boost.Spirit.Karma handily beats `sprintf`](http://tinodidriksen.com/2010/02/07/cpp-convert-int-to-string-speed/). There’s also the FastFormat library, but I’ve never used that. – Konrad Rudolph Feb 05 '15 at 15:36
  • 2
    @KonradRudolph: Thanks for reference to FastFormat. So I learned a new thing also today. Not yet dead! :) – Cheers and hth. - Alf Feb 05 '15 at 15:40
  • Is std::string guaranteed contiguous now? I thought &v[0] was only safe with vector – Kenny Ostrom Feb 05 '15 at 15:41
  • 1
    @KennyOstrom: `std::string`'s buffer has been guaranteed contiguous since the Lillehammer meeting in 2005, I think it was. That's about a decade now. ;-) – Cheers and hth. - Alf Feb 05 '15 at 15:41
  • `cppformat` is very good too, would love to see it in the standard. You may want to use `std::string::data()` to access the buffer directly. – edmz Feb 05 '15 at 15:42
  • @Cheersandhth.-Alf the only gotcha is that `&v[0]` is not guaranteed to be null terminated according to the standard the way `.c_str()` is. Practically it always will be... but treating the former as a c style string is at your own risk – Mgetz Feb 05 '15 at 15:43
  • 1
    @Mgetz: There are a number of issues with your statement and it really needs a longer discussion (e.g. there were changes from C++03 to C++11). But here it suffices to note that the code above does not rely on null-termination of the ordinary buffer contents. The initialization to zero is just because it goes against good programming practice to introduce needless inefficiency, even micro-inefficiency, but except for that the buffer could just as well have been initialized to e.g. `#` characters. – Cheers and hth. - Alf Feb 05 '15 at 16:16
  • @Cheersandhth.-Alf I'm well aware of the caveats – Mgetz Feb 05 '15 at 16:27
2

"My question is, how can I minimize these allocations"

It occurs to me that you have a reason for those names. Can you compute them at the time of need, rather than always generating the name in the constructor? That would be the best improvement - don't do it at all until needed.

If you happen to already have a virtual base, and the class type determines the string, that makes it really easy. Otherwise, an enumerated type could replace the string, and you have a lookup table.

Object* object1= new Object(i, j));
std::string Object::getName(){ compute here }

If this doesn't help, then you actually do need the string for each and every object, and you can only get a small speedup by optimizing that function. I noticed that you construct the string at one size then grow it afterwards. Also you could work with a char buffer then assign it to the member string (in the constructor).

Kenny Ostrom
  • 5,639
  • 2
  • 21
  • 30
  • Yes, unfortunately I cannot compute the names at the time of need, I have to store them in the object so that I can reference the object by its unique name. – burtmacklin16 Feb 05 '15 at 15:43
1

Yes, your code is doing a lot of allocations. Analyzing the allocations:

std::string new_name(name); // 1
new_name.reserve(new_name.length()+5); // 2

new_name += ":";
new_name += boost::lexical_cast<std::string>(num1); // possibly 4 (boost + operator+=)
new_name += ":"; // possibly 5
new_name += boost::lexical_cast<std::string>(num2); // possibly 7

'possibly' because it depends on the characters needed by the numbers (the higher, the more).

If you're really concerned about memory allocations, asprintf(not standard though) or your version (based on the return value of s(n)printf) is likely to be the best choice:

std::string makeName(const char* name, uint16_t num1, uint16_t num2)
{
     char *ptr = nullptr; 
     int size  = asprintf( &ptr, "%s:%u:%u", name, num1, num2);
  return std::string(ptr, size); // to avoid copying the chars
}  

Note: As @Cheersandhth.-Alf pointed out, in case std::string failed to allocate memory, ptr would be ptr is leaked. The best way to solve this would involve using std::unique_ptr but I'll leave you to work it out to your needs.

if you don't want to use asprintf, you can get a similar behavior with std::snprintf

std::string makeName(const char* name, uint16_t num1, uint16_t num2)
{
    int length = std::snprintf(nullptr, 0, "%s:%u:%u", name, num1, num2);

    if (length > 0 )
    {
        std::string new_name(length + 1, '\0');
        std::snprintf(&new_name[0], new_name.length(), "%s:%u:%u", name, num1, num2);

       return new_name;
    } 
    // else handle failure
} 

The difference with your version (I didn't use boost::lexical_cast but std::to_string) is very big: ran 500 times the first version uses 72,890 bytes while the second only 23,890! (measured with valgrind memcheck)

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
edmz
  • 8,220
  • 2
  • 26
  • 45
  • **–1** The code shown does not work and would not work if "corrected". The basic idea of this code, storing a pointer to a new buffer in `&(s.data())`, is not even meaningful. Add to that a non-standard function. – Cheers and hth. - Alf Feb 05 '15 at 16:23
  • @Cheersandhth.-Alf You're right, I was realizing that too. Fixing. – edmz Feb 05 '15 at 16:42
  • Nice update. I'll let my downvote stand for now since the middle example leaks memory and the third example needlessly incurs inefficiency (doing the work twice) in a misguided attempt to shave off a few bytes of dynamically allocated memory. The latter is not an error that IMO would justify a downvote, and the memory leak is borderline, but in sum I think this means that this answer is more misguiding than guiding, hence my decision about that. – Cheers and hth. - Alf Feb 05 '15 at 17:51
  • By the way, the probably easiest way to deal with the memory leak is to use a `std::unique_ptr` with a custom destruction function. The custom destruction function should use `free` to deallocate the memory. – Cheers and hth. - Alf Feb 05 '15 at 17:54
  • 1
    Well, since the `uint`s can have up to 6 characters you could only calculate the string length + 12. But, AFAIK, that's the best way to allocate the right space for `s(n)printf`. I'll put a note about the memory leak to let the OP reason about it so he can fulfill his requirements (he can't use C++11). – edmz Feb 05 '15 at 18:47
  • 1
    Note that there's no way to specify a buffer to `std::string`, except moving a buffer from a temporary. The constructor taking a pointer does not take ownership of the argument buffer. It copies. – Cheers and hth. - Alf Feb 05 '15 at 19:04