Optimization needed - Algorithm implementation takes up TOO much memory

Question

OK, guys : memory optimization is definitely not my thing and since I'm currently working on a big, and cpu and memory-intensive project, I think I need your help.

The project is a Chess Engine and the actual problem lies (I guess) in one of the 2 following methods (the code above is not 100% exact but it's more-or-less that) :

Tree Search (MiniMax) - the actual code is an Alpha-Beta with different additions, but this quite basic example is more illustrative :

int Board::miniMax_(int ply)
{
    if (ply == this->searchDepth) return this->eval();

    int best = MINUS_INF;

    vector<Move*> moves = this->possibleMoves();

    FOREACH(Move*, move, moves)
    {
        HASHMAKE((*move),this);
        int val = -this->miniMax_(ply+1);
        UNHASHMAKE((*move),this);

        if (val>best) { 
            if (ply==0) this->bestMove = (*move);
            best = val; 
        }

    }
    return best;
}

Move Generation (if you haven't ever played with Chess programming and bitboards, the following might simply look almost non-sensical; but you'll still get the idea in terms of memory handling - well, hopefully...):

vector<Move*> Board::possibleMoves()
{
    U64 ownPieces = this->piecesForColor(this->playing);
    U64 occupied = ~(this->pieces[empty]);

    vector<Move*> moves;

    //-----------------------------------------
    // "Normal" Piece Moves
    //-----------------------------------------

    const int from = (1+(this->playing))*3;
    const int to = (1+(this->playing))*3+6;

    for (int pieceType=from; pieceType<to; pieceType++)
    {
        U64 piece = this->pieces[pieceType];

        for (; piece != 0; piece &= (piece - 1))
        {
            UINT pos = log2(piece & ~(piece-1));

            U64 move;
            switch (pieceType)
            {
                case bPawns:    move = BPAWN_(pos,ownPieces,occupied); break;
                case wPawns:    move = WPAWN_(pos,ownPieces,occupied); break;
                case bRooks:
                case wRooks:    move = ROOK_(pos,ownPieces,occupied); break;
                case bKnights:
                case wKnights:  move = KNIGHT_(pos,ownPieces,occupied); break;
                case bBishops:
                case wBishops:  move = BISHOP_(pos,ownPieces,occupied); break;
                case bQueen:
                case wQueen:    move = QUEEN_(pos,ownPieces,occupied); break;
                case bKing:
                case wKing:     move = KING_(pos,ownPieces,occupied); break;
                default:break;
            }

            for (; move !=0; move &= (move-1))
            {
                moves += new Move(pos, log2(move&~(move-1)),this);
            }
        }
    }

    return moves;
}

The Move class

//=======================================================
// Prototype
//=======================================================

class Move
{
    public:
        Move (string m, Board* b) : from(ALG2FROM(m)), to(ALG2TO(m)), pieceFrom(b->atPosition[from]), pieceTo(b->atPosition[to]) {}
        Move (int f, int t, Board* b) : from(f), to(t), pieceFrom(b->atPosition[from]), pieceTo(b->atPosition[to]) {}
        Move (int val) : value(val) {}

        inline string notation();
        inline string out();
        //----------------------
        int from, to;
        int pieceFrom, pieceTo;
        int value;
};

//=======================================================
// Inline Functions
//=======================================================

inline string Move::notation()
{
    return SSTR(SPOS(this->from) << SPOS(this->to));
}

inline string Move::out()
{
    return SSTR(this->notation() << " :: " << this->value);
}

Obviously, the first function being recursive AND called some millions of times, there is some expected load. The thing is once the search goes up to the 4th,5th ply or something, the app already takes up like 2GB. And the thing is that once it's complete (the search), the memory is still not freed - so I suppose this is indicating a problem.

So, any ideas?

Please, just let me know in case you need to know anything else about the implementation.

Hints :

FOREACH is just a macro for a vector iterator
the += for vector appending comes from Boost
everything in bold is a macro, but in terms of memory overhead, none of them is doing anything intensive (so I decided to omit them)
no destructors implemented whatsoever

Dumb question, but are you freeing that "new Move" anywhere? It would explain the leak if you're not... because destroying the vector doesn't call delete for all its members afaik. — Mike Trusov, Jan 22 '13 at 04:24
@Falmarri LOL. Macros (usually) replace some long sequences of bitwise arithmetic. Trust them, without them the above code would look more like hieroglyphics... — Dr.Kameleon, Jan 22 '13 at 04:24
@MikeTrusov Not so... "dumb". And no I'm not. Actually after the FOREACH loop is finished (in the first function) the `move` is not needed anymore (well quite - apart from the one stored in `bestMove`). However, when I tried `delete`ing, I still had issues. (Anyway, memory handling is **not** my thing, so perhaps the error could have been an obvious one). — Dr.Kameleon, Jan 22 '13 at 04:26
So how big is the vector returned by `possibleMoves()` and is each of the `Move`s held within a separate and distinct allocation? — WhozCraig, Jan 22 '13 at 04:34
@WhozCraig Well the vector returned is of variable size (obviously), but given that it returns valid moves for a specific chessboard setup, it's never that high. Let's say somewhere around 30 on average. — Dr.Kameleon, Jan 22 '13 at 04:35
@paddy 95% of the macros (they will take up a whole new section in case I decided to post them too) include simple bit arithmetic. It won't help at all, regarding memory optimization (the macros have already been scrutinized). If there is a specific one you're curious about, that you think it might affect performance, please let me know. — Dr.Kameleon, Jan 22 '13 at 04:39
And after you're done with them, they are destroyed.. how? I think paddy's answer may serve you pretty well, and not just with possible moves. — WhozCraig, Jan 22 '13 at 04:39
@WhozCraig They're *not*. I agree this seems like a major point. However, whenever I tried a `delete`, at the end of the `FOREACH` loop in `miniMax_`, I always ended up having more trouble (and still without freeing any memory). — Dr.Kameleon, Jan 22 '13 at 04:41
@Dr.Kameleon: I think you're really going overboard on the macros here. Especially in your `Move` constructors. I have absolutely no idea idea wtf is going on with construction of those. — Falmarri, Jan 22 '13 at 04:41
@Falmarri Why do you say so? My `Move` constructors are really simple. `move = BPAWN_(pos,ownPieces,occupied);`, etc ARE NOT constructors. `move` in that case is just a 64-bit number (bitboard) with `1`s placed at the position of bits where a - e.g. Black Pawn - move is possible. -- (Please have a look at an example of Bitboards for Chess Programming : http://www.sluijten.com/winglet/11movegen02.htm) — Dr.Kameleon, Jan 22 '13 at 04:45
"this seems like a major point." Its beyond "major". It is very likely the reason you're here in the first place. I would strongly suggest just a bit of time with [this pdf](http://dl.dropbox.com/u/6101039/Modern%20C%2B%2B.pdf) and consider what paddy's answer may mean along these lines, and not just for `move` management. — WhozCraig, Jan 22 '13 at 04:48
@Dr.Kameleon: I'm talking about this `Move (string m, Board* b) : from(ALG2FROM(m)), to(ALG2TO(m)), pieceFrom(b->atPosition[from]), pieceTo(b->atPosition[to]) {}`. — Falmarri, Jan 22 '13 at 04:50
@Falmarri Well, that's pretty simple too : it just takes an algebraic move notation in `m` (e.g. `e2e4`) and the current board (`b`) and converts it to a square coordinates representation (e.g. `from` square = 15, `to` square = 23, along with identifying the pieces in place) — Dr.Kameleon, Jan 22 '13 at 04:53
@thang They update the distinct piece bitboards + update the Zobrist key for the position. (`UNHASHMAKE` is the reverse function) — Dr.Kameleon, Jan 22 '13 at 05:01
then why do you call possibleMoves a bunch of times when you know that the results are all the same? — thang, Jan 22 '13 at 05:08
by the way this minimax is also greedy. you can do better by generating a forward probability map of the opponent's choices. — thang, Jan 22 '13 at 05:15
@thang The actual version is not even a `MiniMax`, it's actually `Alpha-Beta` search, with aggressive pruning + Null-Move Heuristics + Aspiration Windows + lots more) — Dr.Kameleon, Jan 22 '13 at 05:18
@thang Also I may have misinterpreted your previous question. In that sense yes `HASHMAKE` alters the output (actually what it does it to "perform" a move on the board). — Dr.Kameleon, Jan 22 '13 at 05:18
probably makes sense to just have Move possibleMoves[1024] declared as a member of the board along with an int numPossibleMoves. change the name of the function possibleMoves to updatePossibleMoves. — thang, Jan 22 '13 at 05:26

paddy · Accepted Answer · 2013-01-22T04:58:00.053

3

Here:

vector<Move*> moves = this->possibleMoves();

A vector of pointers does not free the pointers for you. You could try a vector of std::unique_ptr<Move> instead.

You will have better performance if you do not do individual allocations of moves. Use a memory pool. In that respect, you might not really need a vector at all.

Unless Move is a polymorphic class, I suggest you don't allocate it at all. Instead, make a bunch of Set functions on Move and declare your possibleMoves function like this:

void Board::possibleMoves( vector<Move> & moves )

And obviously call it like this:

vector<Move> moves;
possibleMoves( moves );

So, this means that when you add a move, instead of doing a new, you can do something like this:

    moves.push_back( Move(pos, log2(move&~(move-1)),this) );

That invokes the copy constructor. If you want to avoid an extra copy, then make an empty constructor for Move and make the afore-mentioned 'setter' function:

    moves.resize( moves.size()+1 );
    Move & m = moves.back();
    m.Set( pos, log2(move&~(move-1)),this );

I am not 100% sure whether that will be any quicker. Anyway, another thing... If you expect that a typical board almost always has less than 50 possible moves, you can improve performance by doing this at the beginning of possibleMoves:

moves.reserve(50)

That means the vector will hardly ever have to be resized, and thus makes the push_back operation faster.

edited Jan 22 '13 at 04:58

answered Jan 22 '13 at 04:25

paddy

60,864
6
61
103

Could you please elaborate a bit more on the idea (I'm not 100% sure what you mean). Btw, I just posted the full code (except for the macros) of the `Move` class. – Dr.Kameleon Jan 22 '13 at 04:32
Hey, I'm gonna edit with a better theory. This other stuff still holds. Stand by. -- Actually, no that was dumb.. Okay, which part do you need clarification on? – paddy Jan 22 '13 at 04:32
+1 on the pass-in by reference and no allocations. I was about to lead to that with the line of questioning I started, but you got there first. This can only help, btw. Dr. – WhozCraig Jan 22 '13 at 04:36
@paddy Hmmm... Let me try this out and I'll get back to you. Thanks a lot for your help, mate! ;-) – Dr.Kameleon Jan 22 '13 at 04:49
your suggested code will not build. expect vector&, but got vector – thang Jan 22 '13 at 04:57
@paddy UNBELIEVABLE. Thanks a lot. It took a bit of effort to convert *ALL* (and there were pretty everywhere - the project is already more than 30 files of code) `Move` pointers to non-pointers, and so on. But the result is impressive : from 2GB memory in 2 seconds, now it doesn't even go above 45MB. Wonderful. You made my day! ;-) – Dr.Kameleon Jan 22 '13 at 05:41
@WhozCraig Thanks to you, too. Your... pdf was undoubtedly an interesting read. ;-) – Dr.Kameleon Jan 22 '13 at 05:41
@paddy And the most amazing thing of all : there is a more-than-noticeable **speed gain**. Unbelievable? Thanks, again! – Dr.Kameleon Jan 22 '13 at 05:59
2

Glad it was helpful. The speed gain is not unexpected. You were leaking memory up the wazoo. Your program would quite likely have been having memory paging trouble. It would pay to study a bit more about memory management and object lifetime in C++ so you don't fall into this trap again. Happy coding =) – paddy Jan 22 '13 at 07:21
1

@paddy ya gotta love Ph.Ds. Their ideas for algorithms and the math behind them borderline on brilliant sometimes, but too often their coding skills.. ouch. At least he has some good ideas going forward. I hope he takes them to heart. and nice job. – WhozCraig Jan 22 '13 at 07:48
@WhozCraig Just noticed your last comment and cannot help it not to reply : I'm not a Ph.D. (I'm not even a proper uni student; I pretty much hate it). I'm a professional programmer. This one is nothing but one of the many free-time projects of mine. As far as the subject matter goes, it's just that Memory Management and C++ have never been my strongest points... Cheers! :-) – Dr.Kameleon Mar 04 '13 at 09:08
1

@Dr.Kameleon Oh, well in that case you likely agree with my now-completely-unrelated point. All is well regardless. I hope your project turned out good! If you want a strong practice to try and stick to, consider always striving for [RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) It does *not* mean don't use dynamic allocation. Rather, it simply means resource acquisitions are always controlled by scope-based protection (like a call-stack). – WhozCraig Mar 04 '13 at 09:17
@WhozCraig Well, it's one of those cases where hobbies are fine, but there's the real world too... so for now the Chess Engine is on hold and I'm finding myself playing again with Cocoa and Objective-C... As for C++ (although it's the first thing that comes to (my) mind when dealing with performance-intensive projects), I've definitely got a lot of studying to do. But I suppose studying never stops if you want to call yourself a programmer... Have a good day! ;-) – Dr.Kameleon Mar 04 '13 at 09:28

score 2 · Answer 2 · edited May 23 '17 at 12:27

2

I'm not sure std::vector is the best container in this case.

The number of possible moves is finite, and it should be possible to calculate it. I'm not going to guess on how you'd do that, but assuming you can do it (or just use a const big number)... then it might be worth using an array instead:

Move *moves = new Move[maxMoves];
FillPossibleMoves(moves, maxMoves)
// do stuff
delete [] moves;

Then you can update your possible moves function to:

int Board::FillPossibleMoves(Move *moves, int maxMoves)
{
    int i = 0;
    ...
    moves[i].Something = Whatever;
    i++;
    ...
    return i;
}

That way you'll be allocating memory only once, and cleaning it up when you're done with it.

If you agree with: Using arrays or std::vectors in C++, what's the performance gap? which says to avoid dynamic arrays, then:

std::vector<Move> moves(maxMoves);
// if you use a const, then you can keep declaration above as a member
// and call moves.clear() here instead
int movesAdded = board->FillPossibleMoves(moves);
// do stuff


int Board::FillPossibleMoves(std::vector<Move> &moves)
{
    int i = 0;
    ...
    moves[i].Something = Whatever;
    i++;
    ...
    return i;
}

edited May 23 '17 at 12:27

Community

1
1

answered Jan 22 '13 at 04:44

Mike Trusov

1,958
1
15
23

And this differs from a `std::vector moves(maxMoves);` how? [RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization), my friend. It's whats for dinner. – WhozCraig Jan 22 '13 at 04:46
@WhozCraig: Well because the OP is obviously having problems understanding memory management and pointer ownership, so it simplifies that. – Falmarri Jan 22 '13 at 04:48
In that respect you'd be better off getting dirty, ditching the vector and class paradigm altogether and making a great big buffer that the recursive calls use as a stack. – paddy Jan 22 '13 at 04:49
1

@paddy strangely, its amazing how many game engines with look-ahead theory are pretty much *exactly* what you just described =P – WhozCraig Jan 22 '13 at 04:50
Something tells me using a vector has a little bit more overhead than using an array. You can go ahead and argue all you like about arrays being unsafe and evil. – Mike Trusov Jan 22 '13 at 04:54
@WhozCraig Yeah, it's how I would implement it. But it's a bit too far removed from what is being done here, so I only made a vague reference by saying "memory pools" in my answer. =) – paddy Jan 22 '13 at 04:54
@MikeTrusov The performance difference between vectors (when used correctly) and dynamic arrays is miniscule compared to the fact that in both cases you are asking the OS to allocate memory for you. – paddy Jan 22 '13 at 04:57

score 0 · Answer 3 · answered Jan 22 '13 at 04:24

0

no destructors implemented whatsoever

Why do you think memory should be released. You're calling new a whole crapload of times but you're never deleting the Moves you create.

Also, show us the definition of the Move class

answered Jan 22 '13 at 04:24

Falmarri

47,727
41
151
191

Just posted the whole `Move` class (nothing is missing - except for the Macro definitions of course...) – Dr.Kameleon Jan 22 '13 at 04:31

Optimization needed - Algorithm implementation takes up TOO much memory

3 Answers3