Wrapping non-memory-contiguous c/c++ data as numpy array

Question

I have a C++ class that provides an interface to data for a number of "particles" (the context is a physics simulation). The data for each particle are stored in a struct, and the class has an array of pointers to the structs. I really don't want to mess with this storage scheme because:

The data are stored on disk in a binary format that is not of my own devising, and writing a new function to read the files into some other storage structure would not be straightforward.
I have a wealth of other C/C++ code designed around the same data storage scheme that will be unusable or require a major overhaul if the storage structure is changed.

Now, I want to use python to do some visualization. The ideal scenario is having access to my data as numpy arrays so I can use a variety of numpy functions (histograms, sorts, binning, statistics, etc.). I have a working solution using SWIG to wrap my class into Python. The drawback is that I need to make partial copies of the data (from the structs buried in the C++ class into numpy arrays). As my work with these simulations progresses, I'm pushing to the limits imposed by my hardware, which means I want to push the number of particles up to where the data occupy a large fraction of the available memory. So making copies is to be avoided at all costs.

Is there a way to map a numpy array onto this mess of data? Some poking around seems to point to a "no" answer, but what if I relax my "no copy" requirement a bit and allow a bit of wiggle room to create an extra array of pointers? I'll sketch out what I'm thinking:

struct particle_data
{
  double x[3];
  double vx[3];
  //more data
}

class Snap
{
  struct particle_data *P; //this gets allocated, so data is accessed as P[i].x[j] and so on
  //a bunch of other functions, flags, etc.
}

What I'm thinking is that I can create an array of pointers, e.g.

double **x0;
//of course allocate some memory for the array here...
for(int i=0; i<max; i++)
{
  x0 = &P[i].x[0]
}

And hopefully somehow get this to play nicely in python as a numpy array of doubles. If I'm especially lucky it will be possible to avoid making similar arrays x1 and x2 since x0[i]+1 = x1[i] and x0[i]+2 = x2[i].

I have no idea if this is possible or how to set it up, though. In a perfect world I can stick with SWIG, but I have a hunch that this will involve writing some wrappers myself, if it's possible.

Found this related question, but still stuck. http://stackoverflow.com/questions/4355524/getting-data-from-ctypes-array-into-numpy — Kyle, Mar 20 '14 at 21:11
as a comment, the structure `particle_data` could be allocated by existing `c/c++` code, and then exposed through `cython` to the `python` environment as a `numpy` array of doubles of size `6`, without making copies. That said, I don't know if you can go any further without copying the data because the structures are not contiguous. — gg349, Mar 20 '14 at 21:51
@flebool Hmm, useful tip, but won't work out here because the '//more data' in particle_data contains a variety of ints, floats, doubles... — Kyle, Mar 20 '14 at 22:00

score 1 · Accepted Answer · answered Mar 21 '14 at 03:16

I would approach this by determining how you could store your C++ object as an array. If you can find a way, then exposing this to Python via SWIG will be easy. If you can't even do it at the C++ level (there may not be a way to do what you want), then you can't go further. However it seems like you think there should be an array structure that can be used to represent your data so push in that direction in C++ first. Once solved, most of the access-from-battle is won.

Wrapping non-memory-contiguous c/c++ data as numpy array

1 Answers1