0

Is there a way of creating a RAWSXP vector that is backed by an existing C char* ptr.

Below I show my current working version which needs to reallocate and copy the bytes, and a second imagined version that doesn't exist.

    // My current slow solution that uses lots of memory
    SEXP getData() {
      // has size, and data
      Response resp = expensive_call();
    
      //COPY OVER BYTE BY BYTE
      SEXP respVec = Rf_allocVector(RAWSXP, resp.size);
      Rbyte* ptr = RAW(respVec);
      memcpy(ptr, resp.msg, resp.size);
    
      // free the memory
      free(resp.data);
    
      return respVec;
    }
    
    // My imagined solution
    SEXP getDataFast() {
      // has size, and data
      Response resp = expensive_call();
    
      // reuse the ptr
      SEXP respVec = Rf_allocVectorViaPtr(RAWSXP, resp.data, resp.size);
    
      return respVec;
    }

I also noticed Rf_allocVector3 which seems to give control over memory allocations of the vector, but I couldn't get this to work. This is my first time writing an R extension, so I imagine I must be doing something stupid. I'm trying to avoid the copy as the data will be around a GB (very large, sparse though, matrices).

esiegel
  • 1,773
  • 2
  • 19
  • 31
  • You could also look into external pointers (Section 5.13 of Writing R Extensions) and tell R that you look after allocated memory yourself. You could then provide accessor for summaries or parts. – Dirk Eddelbuettel Dec 25 '20 at 03:09
  • That is indeed a good suggestion, but in my case a down stream function within a third party library expects a connection or a raw vector. – esiegel Dec 25 '20 at 15:34

1 Answers1

0

Copying over 1 GB is < 1 second. If your call is expensive, it might be a marginal cost that you should profile to see if it's really a bottleneck.

The way you are trying to do things is probably not possible, because how would R know how to garbage collect the data?

But assuming you are using STL containers, one neat trick I've recently seen is to use the second template argument of STL containers -- the allocator.

template<
    class T,
    class Allocator = std::allocator<T>
> class vector;

The general outline of the strategy is like this:

  1. Create a custom allocator using R-memory that meets all the requirements (essentially you just need allocate and deallocate)
  2. Every time you need to a return data to R from an STL container, make sure you initialize it with your custom allocator
  3. On returning the data, pull out the underlying R data created by your R-memory allocator -- no copy

This approach gives you all the flexibility of STL containers while using only memory R is aware of.

thc
  • 9,527
  • 1
  • 24
  • 39
  • Through profiling I've seen the execution slow down a whole bunch when I run out of memory and move to swap, which is why I'm hoping to avoid duplication. To confess I am a bit confused as to why the the garbage collector in R doesn't free memory. I am not using c++, but have noticed the method ```SEXP Rf_allocVector3(SEXPTYPE, R_xlen_t, R_allocator_t*);``` This should allow for custom allocation of a vector. Documentation within the source points to needing to allocate space for the data and the SEXP header. I could probably use `realloc` and hope it reuses. who knows. – esiegel Dec 25 '20 at 02:51