8

Suppose you write a matrix class with some operations:

class matrix
{
public:
    double operator()(size_t i, size_t j) const;
    ...
};

matrix operator*(const matrix &lhs, const matrix &rhs);
...

It makes sense to defer the evaluation of some matrix expressions: m0 * m1 * m2 * m3 * m4 (which is a series of four operator* calls) can benefit from using the dynamic-programming matrix chain multiplication algorithm; the very common m0 * m1t has a very efficient dgemm implementation, and so forth.

It consequently pays to defer the actual computation until when it's needed. This changes the above to:

class matrix
{
private:
    /*
    * Pointer to an abstract base class - either an actual matrix, 
    *    or an expression tree. */
    std::shared_ptr<matrix_imp> m_imp;

public:
    // Forces compaction - 
    double operator()(size_t i, size_t j) const;
    ...
};

/* Lazy; creates a matrix with an expression tree using the
*    internals of lhs and rhs. */
matrix operator*(const matrix &lhs, const matrix &rhs);
...

Each matrix holds a pointer to a base class object that can range from a real matrix to a complicated expression tree. Each operation tries to form a matrix using the laziest change to the internal implementation. Some operations have no choice but to actually evaluate things, compact the expression tree, and set the internal implementation to an actual matrix.


The problem was that, in practice, this caused huge memory overhead in very common cases. Let's say you read from a file a long-and-narrow matrix x = xp X q, p >> q, store xt x in a variable, and discard x. With lazy evaluation, the memory is pq >> qq. Load these in a loop, and it's a serious problem. (Of course, compaction can be forced after each load by the client code calling operator(), but requiring this without algorithmic justification is ugly and error prone.)

Initially, I thought that the move ctor was a good point for automatic compaction - it's exactly the point where a temporary becomes a named object, and it's named objects that cause increasing memory consumption, so

matrix(matrix &&other); // <- Force compaction only here

would appear to solve everything, e.g.,

auto res = // <- temp becoming named
    a * // temp
    b * // temp
    c + // temp
    2 * // temp
    d;

but can it be counted on? E.g., consider

matrix load_xtx(const string &f_name)
{
    matrix x = ...
    return x.t() * x; 
}

auto xtx = load_xtx("foo.hdf5"); // (*)

is the compiler forbidden to do in (*) something similar to what it does with the NRVO, namely just to construct it in place? Even if not, might a compiler optimize away things in other cases?

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • 1
    Wouldn't it be simpler to perform the actual calculation when the matrix is accessed, not merely stored ? – Quentin Jun 02 '15 at 11:53
  • @Quentin yes, but if I understand you correctly, this is exactly what caused the huge memory overhead. Say you have a loop loading thousands of x matrices, transforming each to xtx - all before the first access (not made up scenario at all) - this type of stuff used to trash the system. – Ami Tavory Jun 02 '15 at 12:00
  • Oh, I see. – Quentin Jun 02 '15 at 12:11
  • Maybe this [old question](http://stackoverflow.com/questions/414243/lazy-evaluation-in-c) can help. – Alberto M Jun 02 '15 at 12:30
  • @AlbertoM Very interesting, thanks! Will need to think about the stuff. – Ami Tavory Jun 02 '15 at 12:36
  • Are you sure that only named objects cause application memory usage to increase? I'm pretty sure if I allocate 2 gb and free it the next line, the memory allocated by the application will be 2gb. No? – Mustafa Ozturk Jun 02 '15 at 13:01
  • @MustafaOzturk I'm not so sure that's correct. *However*, I'm pretty sure that if the app in a loop of 1000, allocates and immediately deallocates 2GB, then the overall consumption won't be 1000 * 2GB, and that's good enough in this case. – Ami Tavory Jun 02 '15 at 14:02
  • @AlbertoM On reading the thread you referenced, I have to say - nice catch! It definitely can solve it. If you'd phrase an answer of some sorts, I'd be more than happy to accept it. Many thanks in any case. – Ami Tavory Jun 02 '15 at 14:03
  • @AmiTavory Thanks for the compliments -- the biggest credit goes however to the 2009 answerer. I will anyway post a full answer ASAP – Alberto M Jun 02 '15 at 14:14

1 Answers1

2

Since the "internal pointer" method cannot give all the flexibility needed for the deferred evaluation, the typical solution used by C++ numerical libraries is to define specialized classes implementing lazy evaluation mechanisms. The old SO question Lazy evaluation in C++ and its best answers show the basics of such design and some sample code.

Although I am not an expert, I think good examples of this architecture are the numerical libraries Eigen (here some details about its implementation) and Blitz++, which heavily relies on templates (I did not find on the web updated documentation illustrating its internals, but this article describes some part of its engine and also provides a broader overview of the "expression template" technique).

Community
  • 1
  • 1
Alberto M
  • 1,057
  • 8
  • 24