1

I'm embedding Ruby 2.2 into a C++ application and try to optimise a function of the form

VALUE toRubyString( const MyObject &o )
{
     const char *utf8 = get_string_rep_of( o );
     return rb_enc_str_new( utf8, strlen( utf8 ), rb_utf8_encoding() );
}

It turned out that toRubyString dominates the runtime profiling output in certain use cases. In those cases, the function gets called very often but only with a few different MyObject values. Hence, my idea is to cache the VALUE values in a std::map<MyObject, VALUE> or the like such that I can reuse them, along the lines of

std::map<MyObject, VALUE> cache;

VALUE toRubyString( const MyObject &o )
{
     std::map<MyObject, VALUE>::const_iterator it = cache.find( o );
     if ( it != cache.end() ) {
         return it->second;
     }
     const char *utf8 = get_string_rep_of( o );
     VALUE v = rb_enc_str_new( utf8, strlen( utf8 ), rb_utf8_encoding() );
     cache[o] = v;
     return v;
}

Alas, I noticed that with this modification, the Ruby interpreter eventually crashes, and the crashes disappear if I omit the return it->second; line (i.e. when the code refrains from reusing cached entries).

I suspect this is related to the garbage collector since it only happens after a couple thousand calls to the function, but even a

rb_gc_mark(v);

call (before adding the VALUE to the cache) didn't help. Does anybody have some idea as to what I might be missing here?

Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
  • The key question is ownership. Who is responsible to clean up the resources you're using -- `v` and `utf8`? Does `v` depend on `utf8` persisting after the end of this function? (Two other minor notes: since you're concerned about speed, consider using `std::unordered_map` instead of `std::map` -- O(1) vs. O(log N) lookup time. I note also that you put `MyObject` as a value rather than a pointer/ref in your map. That will make a copy of it. This may be fine, but it could be expensive or make the map not match, depending on how your ordering for `std::map` is calculated.) – metal Nov 17 '17 at 13:26
  • @metal I think you're spot on that this is some ownership issue (which is why I suspect the Ruby GC is relevant here); the ownership of the data pointed to by `utf8` remains with `get_string_rep_of`. `rb_enc_str_new` doesn't take ownership but instead creates a copy of the data. As for the `VALUE` values - I'm not sure really how Ruby manages ownerships here, there doesn't seem to be explicit reference counting in any case. – Frerich Raabe Nov 17 '17 at 13:30
  • Another alternative would be to cache the value in Ruby, i.e. make toRubyString a memoizing method. While probably slightly worse in performance, it seems to be more reliable. However, in any case the question is what is the hash key (this applies also to the C++ solution). If you use the object id (as you seem to prefer), you wouldn't notice if the string representation of the object changes. – user1934428 Nov 17 '17 at 13:54

1 Answers1

1

rb_global_variable(&cache[o]) might work.

Apparently, inserting into a map should not invalidate pointers to existing elements, so it's technically okay to use the address of the map value like that. And then rb_global_variable will make it so the cached value is completely excluded from GC.

The reason why rb_gc_mark doesn't work is because that only prevents the object from being cleaned up on the next collection. You usually use it when defining a custom class's mark function so that the GC can mark internally-referenced objects appropriately.

Max
  • 21,123
  • 5
  • 49
  • 71
  • Thanks for the pointer. Using a raw pointer seems fragile since I cannot control if/when the map will reallocate (and move) memory internally. I guess I could try a preallocated `std::vector` and then try using a `std::map::const_iterator>`. – Frerich Raabe Nov 20 '17 at 10:08
  • That's why I linked to that question. According to the standard, a map should not reallocate older keys when you insert a new one. Only if you are removing/updating entries in the cache do you need to worry about that issue. – Max Nov 20 '17 at 20:36