So I'm currently refactoring a giant function:
int giant_function(size_t n, size_t m, /*... other parameters */) {
int x[n]{};
float y[n]{};
int z[m]{};
/* ... more array definitions */
And when I find a group of related definitions with discrete functionality, grouping them into a class definition:
class V0 {
std::unique_ptr<int[]> x;
std::unique_ptr<float[]> y;
std::unique_ptr<int[]> z;
public:
V0(size_t n, size_t m)
: x{new int[n]{}}
, y{new float[n]{}}
, z{new int[m]{}}
{}
// methods...
}
The refactored version is inevitably more readable, but one thing I find less-than-satisfying is the increase in the number of allocations.
Allocating all those (potentially very large) arrays on the stack is arguably a problem waiting to happen in the unrefactored version, but there's no reason that we couldn't get by with just one larger allocation:
class V1 {
int* x;
float* y;
int* z;
public:
V1(size_t n, size_t m) {
char *buf = new char[n*sizeof(int)+n*sizeof(float)+m*sizeof(int)];
x = (int*) buf;
buf += n*sizeof(int);
y = (float*) buf;
buf += n*sizeof(float);
z = (int*) buf;
}
// methods...
~V0() { delete[] ((char *) x); }
}
Not only does this approach involve a lot of manual (read:error-prone) bookkeeping, but its greater sin is that it's not composable.
If I want to have a V1
value and a W1
value on the stack, then that's
one allocation each for their behind-the-scenes resources. Even simpler, I'd want the ability to allocate a V1
and the resources it points to in a single allocation, and I can't do that with this approach.
What this initially led me to was a two-pass approach - one pass to calculate how much space was needed, then make one giant allocation, then another pass to parcel out the allocation and initialize the data structures.
class V2 {
int* x;
float* y;
int* z;
public:
static size_t size(size_t n, size_t m) {
return sizeof(V2) + n*sizeof(int) + n*sizeof(float) + m*sizeof(int);
}
V2(size_t n, size_t m, char** buf) {
x = (int*) *buf;
*buf += n*sizeof(int);
y = (float*) *buf;
*buf += n*sizeof(float);
z = (int*) *buf;
*buf += m*sizeof(int);
}
}
// ...
size_t total = ... + V2::size(n,m) + ...
char* buf = new char[total];
// ...
void* here = buf;
buf += sizeof(V2);
V2* v2 = new (here) V2{n, m, &buf};
However that approach had a lot of repetition at a distance, which is asking for trouble in the long run. Returning a factory got rid of that:
class V3 {
int* const x;
float* const y;
int* const z;
V3(int* x, float* y, int* z) : x{x}, y{y}, z{z} {}
public:
class V3Factory {
size_t const n;
size_t const m;
public:
Factory(size_t n, size_t m) : n{n}, m{m};
size_t size() {
return sizeof(V3) + sizeof(int)*n + sizeof(float)*n + sizeof(int)*m;
}
V3* build(char** buf) {
void * here = *buf;
*buf += sizeof(V3);
x = (int*) *buf;
*buf += n*sizeof(int);
y = (float*) *buf;
*buf += n*sizeof(float);
z = (int*) *buf;
*buf += m*sizeof(int);
return new (here) V3{x,y,z};
}
}
}
// ...
V3::Factory v3factory{n,m};
// ...
size_t total = ... + v3factory.size() + ...
char* buf = new char[total];
// ..
V3* v3 = v3factory.build(&buf);
Still some repetition, but the params only get input once. And still a lot of manual bookkeeping. It'd be nice if I could build this factory out of smaller factories...
And then my haskell brain hit me. I was implementing an Applicative Functor. This could totally be nicer!
All I needed to do was write some tooling to automatically sum sizes and run build functions side-by-side:
namespace plan {
template <typename A, typename B>
struct Apply {
A const a;
B const b;
Apply(A const a, B const b) : a{a}, b{b} {};
template<typename ... Args>
auto build(char* buf, Args ... args) const {
return a.build(buf, b.build(buf + a.size()), args...);
}
size_t size() const {
return a.size() + b.size();
}
Apply(Apply<A,B> const & plan) : a{plan.a}, b{plan.b} {}
Apply(Apply<A,B> const && plan) : a{plan.a}, b{plan.b} {}
template<typename U, typename ... Vs>
auto operator()(U const u, Vs const ... vs) const {
return Apply<decltype(*this),U>{*this,u}(vs...);
}
auto operator()() const {
return *this;
}
};
template<typename T>
struct Lift {
template<typename ... Args>
T* build(char* buf, Args ... args) const {
return new (buf) T{args...};
}
size_t size() const {
return sizeof(T);
}
Lift() {}
Lift(Lift<T> const &) {}
Lift(Lift<T> const &&) {}
template<typename U, typename ... Vs>
auto operator()(U const u, Vs const ... vs) const {
return Apply<decltype(*this),U>{*this,u}(vs...);
}
auto operator()() const {
return *this;
}
};
template<typename T>
struct Array {
size_t const length;
Array(size_t length) : length{length} {}
T* build(char* buf) const {
return new (buf) T[length]{};
}
size_t size() const {
return sizeof(T[length]);
}
};
template <typename P>
auto heap_allocate(P plan) {
return plan.build(new char[plan.size()]);
}
}
Now I could state my class quite simply:
class V4 {
int* const x;
float* const y;
int* const z;
public:
V4(int* x, float* y, int* z) : x{x}, y{y}, z{z} {}
static auto plan(size_t n, size_t m) {
return plan::Lift<V4>{}(
plan::Array<int>{n},
plan::Array<float>{n},
plan::Array<int>{m}
);
}
};
And use it in a single pass:
V4* v4;
W4* w4;
std::tie{ ..., v4, w4, .... } = *plan::heap_allocate(
plan::Lift<std::tie>{}(
// ...
V4::plan(n,m),
W4::plan(m,p,2*m+1),
// ...
)
);
It's not perfect (among other issues, I need to add code to track destructors, and have heap_allocate
return a std::unique_ptr
that calls all of them), but before I went further down the rabbit hole, I thought I should check for pre-existing art.
For all I know, modern compilers may be smart enough to recognize that the memory in V0
always gets allocated/deallocated together and batch the allocation for me.
If not, is there a preexisting implementation of this idea (or a variation thereof) for batching allocations with an applicative functor?