0

I got here trying to transpose a matrix in O(1). So right now I have something like this:

#include <vector>

template <class T>
class Matrix {  //intended for math matrix
public:
    Matrix(int Rows, int Cols){matrix = vector<vector<T> >(Rows, vector<T>(Cols,0)); transpose = false; cols = Cols; rows = Rows;}
    Matrix(const Matrix<T> &M){matrix = M.matrix; transpose = M.transpose; cols = M.cols; rows = M.rows;}
    ~Matrix(){}

    void t(){
        transpose = !transpose; 
        swap(cols,rows);
    }

    T& operator()(int row, int col){
        if(transpose) 
            return matrix[col][row]; 
        else 
            return matrix[row][col];
    }

private:
    vector<vector<T> > matrix;
    bool transpose;
    int cols;
    int rows;
};

In that code I have what I want: t() is O(1) and operator() is also O(1). But operator() is used a lot and I want to take away the if.
So, to verify if I can improve performance I want to have something like this:

#include <vector>

template <class T>
class Matrix {  //intended for math matrix
public:
    Matrix(int Rows, int Cols){matrix = vector<vector<T> >(Rows, vector<T>(Cols,0)); transpose = false; cols = Cols; rows = Rows;}
    Matrix(const Matrix<T> &M){matrix = M.matrix; transpose = M.transpose; cols = M.cols; rows = M.rows;}
    ~Matrix(){}

    T& originalShape(int row, int col){return matrix[row][col];}
    T& transposedMatrix(int row, int col){return matrix[col][row];}
    void t(){
        transpose = !transpose;
        swap(cols,rows);

        if(transpose)
            &operator() = &transposedMatrix;
        else
            &operator() = &originalShape;
    }
    T& operator()(int row, int col){return matrix[row][col];}

private:
    vector<vector<T> > matrix;
    bool transpose;
    int cols;
    int rows;
};

Of course, that doesn't work. And I didn't find anything useful for this case.

More info about performance impact of t() and operator():

I read that some libraries both use t() in O(1) and O(rows*cols), according to what is going to be used the matrix for. But doing t() in O(1) seems to be a good first step. If then I call a method I know it would access row by row, then I could do the copy transpose at that moment.

And for taking the if: the idea is to put all the weight of the operation in t() and to have the fastest operator(), because t() would be call once in a while and operator() a lot. I also want to know how to do this because it might become helpful in another situation.

The question

Maybe i lack enough English to make this clear: the objective of the question is to find a good way to change the behavior of operator(). Not to improve Matrix() (advice is much appreciated though), nor to discard any way of changing the behavior of operator() just because it might not be better than the if. Ultimately, I will analyze, code and post the one that has the best performance between what i have and what i get as answers. But to know what has better performance i need those codes/algorithms/patterns/whatever, and I think this may help other people in different but similar situations.

Pernoctador
  • 108
  • 8
  • 2
    You can't change a function the way you want, but you can store a function pointer or `std::function` and call it inside `operator()`. – user4581301 Jan 25 '18 at 23:43
  • 2
    As long as you are using a `vector > matrix;`, you most certainly need not worry about the cost of a branch. And even once you fix that, I would be far from surprised if cache effects still dominated. Don't try to optimize without measuring. – Baum mit Augen Jan 25 '18 at 23:45
  • 2
    @user4581301 Are you sure replacing a simple branch with a type erasure object actually helps performance? – Baum mit Augen Jan 25 '18 at 23:46
  • 3
    Warning: `T& transposedMatrix(int row, int col){return matrix[col][row];}` will be accessing memory against the grain. If you are going to do a lot of it, you may be better off transposing the matrix so you can access the memory in row-major order. The cost of skipping around through RAM and not being able to load cache can be astronomical. – user4581301 Jan 25 '18 at 23:47
  • @BaummitAugen i'm pretty sure it will help performance. How much, i will measure after i have some code working. But i think is a good thing to know how to do this. – Pernoctador Jan 25 '18 at 23:48
  • 2
    @BaummitAugen no, but I'm fairly confident of the opposite. I think my function pointer suggestion would be worse. – user4581301 Jan 25 '18 at 23:48
  • 1
    @user4581301 Not such a good idea in this context then, is it? ;) (Not that it matters much when looking at the rest of the code, but in principle...) – Baum mit Augen Jan 25 '18 at 23:49
  • @user4581301 How's the sintaxis for that? something like T& operator(a,b){return *ptr(a,b);} and t(){!transpose; if(transpose) *ptr = &transposedMatrix}?? Because even &transposedMatrix has an error, and I don't know how to fix it. – Pernoctador Jan 25 '18 at 23:50
  • 1
    The first rule of optimization is to **measure**. But assuming you've done that, you may have misinterpreted the results. I would expect the indirection and willy-nilly memory access implied by a vector of vectors, to be the real performance buster. Replace that with a single vector as storage for the matrix. – Cheers and hth. - Alf Jan 26 '18 at 00:00
  • As OP clarified [below](https://stackoverflow.com/questions/48453703/how-to-modify-a-method-in-runtime-in-c#comment83900732_48453820) this isn't about performance after all, this is a dupe of e.g. https://stackoverflow.com/q/29645454/3002139 – Baum mit Augen Jan 26 '18 at 00:03
  • I can only measure one code. That means I still don't know what is better until I have another code with another algorithm to compare. I also read that some libraries do both things: t() in O(1) AND in some cases they copy everything to another matrix. Don't know when to do what, but that's my goal – Pernoctador Jan 26 '18 at 00:03
  • @BaummitAugen I certainly regret not putting a big, **BOLD** caveat on that comment. Pernoctador This is a "You can do it, but..." If you want speed, first rethink the class. [This linked matrix code can be blazingly fast](https://isocpp.org/wiki/faq/operator-overloading#matrix-subscript-op), but for simplicity I recommend replacing the dynamic array with a single `vector`. – user4581301 Jan 26 '18 at 00:23
  • You say you want to take away the `if` in the `operator()`. Do you have evidence through testing/profiling that indicates the usage of `operator()` has a significant impact on performance? And, if so, do you have evidence that the `if` dominates that impact? If you don't have such evidence, your question is pointless. – Peter Jan 26 '18 at 00:23
  • @user4581301 thanks, i'll check that. – Pernoctador Jan 26 '18 at 00:24
  • @Peter `if`'s aren't free. If I put the weight of the transpose in `t()` which will be called once in a while,against `operator()` that might be called a lot (e.g. Gaussian Elimination or multiplication), i **might** have a performance improvement, but it depends on how i take that `if` away. Of course I can have the weight of that `if`, but if i don't have something to compare, i don't know how good/bad that is. I need to know the alternatives – Pernoctador Jan 26 '18 at 00:28
  • `if`s aren't free ([here's an example of how weird/costly they can get](https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array)) but there is a lot of other stuff at play here and several of them dwarf the cost of the `if`. The `vector` of `vector`s trick makes it really easy to write a quick matrix, but unfortunately the memory used is all over the place, and every time the CPU has to go looking for the next value, as opposed to having already cached the next value in line, you have to wait. Sometimes a very long time. – user4581301 Jan 26 '18 at 00:45
  • 1
    @Pernoctador - nothing is free in software performance. And humans are notoriously BAD at working out where hot spots in code are or might be, unless they systematically do testing and profiling. You're not doing that. Which means all you are doing here is premature optimisation. Voting to close, accordingly. – Peter Jan 26 '18 at 00:45
  • @Peter actually, if you really try to optimize code, you know you must try to get rid of ifs. Assembler is really nice to know that, and there are a lot of documentation about it (check questions about code in competitions for more info). A valid question here would be what is the prize of getting rid of that particular if. As for hot spots, I'm not looking for them but to get a faster `operator()`. Please read and understand the question. – Pernoctador Jan 26 '18 at 00:51
  • 1
    @Pernoctador It's true there are some cases where a simple change is easy to make, obviously more efficient, and doesn't harm the readability of code, so you might as well do it. But I don't believe any solutions to your problem satisfy even one of those three criteria. So there's no point wasting time on a change that would be somewhat complicated, harder to understand and maintain, and might or might not actually help performance. Premature Optimization Is Evil. – aschepler Jan 26 '18 at 01:24
  • @Pernoctador - I understood the question - I'm challenging your justification of the need. Yes, there are costs associated with branching. And, by trying to "get a faster `operator()`" you are deeming that it has a significant impact on performance, which is the definition of a hot spot. Without supporting evidence, one way or the other. – Peter Jan 26 '18 at 01:26

1 Answers1

1

If you store your matrix as a single vector, you can write the indexing function like this:

T& operator()(int row, int col){
    return matrix[col*colstep + row*rowstep];
}

Initially rowstep is 1 and colstep is rows. The transpose operator swaps these two values and the sizes.

You do have one extra multiplication, you'll have to measure if that is better or worse than the extra if. Note that branch prediction will guess right most of the time if accessing many/all matrix elements in a loop.

You will still have the problem of accessing data in a non-optimal order, it's worth while to write algorithms taking into account the storage order.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • Thanks! I will check it. Storage order will also depend of the algorithm. At some it will have great impact. But in others (i think like gaussian elimination, because pibot) wont so much. Or thats the theory. I'll post what i found later – Pernoctador Jan 26 '18 at 01:13