2

In my chess engine codebase I'm using a very big hash table, the hashtable size can be up to 128 GB. The hast table is a big array of bucket of size 4. The code to manage this big table is done with STL std::vector. I'm happy with the performance of the code but I have some problem when initializing the structure.

the structure of the hash table is the following:

class ttEntry
{
private:

    signed int key:32;          /*! 32 bit for the upper part of the key*/
    signed int packedMove:16;   /*! 16 bit for the move*/
    signed int depth:16;        /*! 16 bit for depth*/
    signed int value:23;        /*! 23 bit for the value*/
    signed int generation:8;        /*! 8 bit for the generation id*/
    signed int staticValue:23;  /*! 23 bit for the static evalutation (eval())*/
    signed int type:3;          /*! 2 bit for the type of the entry*/
                                /*  144 bits total = 16 bytes*/
public:
    explicit ttEntry(unsigned int _Key, Score _Value, unsigned char _Type, signed short int _Depth, unsigned short _Move, Score _StaticValue, unsigned char _gen): key(_Key), packedMove(_Move), depth(_Depth), value(_Value), generation(_gen), staticValue(_StaticValue), type(_Type){}
    explicit ttEntry(){}    

    ...
    ...
};

using ttCluster = std::array<ttEntry, 4>;



class transpositionTable
{
private:
    std::vector<ttCluster> _table;
    ....
    ....
}

my code for allocating the space is the following:

uint64_t transpositionTable::setSize(unsigned long int mbSize)
{

    uint64_t size = (uint64_t)((((uint64_t)mbSize) << 20) / sizeof(ttCluster));
    _elements = size;

    _table.clear();
    _table.shrink_to_fit();
    try
    {
        _table.reserve(_elements);
        _table.resize(_elements); // big bottleneck
    }
    catch(...)
    {
        std::cerr << "Failed to allocate " << mbSize<< "MB for transposition table." << std::endl;
        exit(EXIT_FAILURE);
    }
    return _elements * 4;
}

To initialize the table with 128GB of ram 108 seconds are needed. I'm not interested in initializing the memory with known value but only to allocate the space and have a long enough std::vector.

I know I can rewrite the code with good old C code and malloc, but I'd like to work with modern std::vector.

Any idea on how to speedup the code and where I'm doing it wrong?


following @MarcGlisse and @Bob__ hint I modified my code to:



//
//https://stackoverflow.com/questions/21028299/is-this-behavior-of-vectorresizesize-type-n-under-c11-and-boost-container/21028912#21028912
//
// Allocator adaptor that interposes construct() calls to
// convert value initialization into default initialization.
template <typename T, typename A=std::allocator<T>>
class default_init_allocator : public A {
  typedef std::allocator_traits<A> a_t;
public:
  template <typename U> struct rebind {
    using other =
      default_init_allocator<
        U, typename a_t::template rebind_alloc<U>
      >;
  };

  using A::A;

  template <typename U>
  void construct(U* ptr)
    noexcept(std::is_nothrow_default_constructible<U>::value) {
    ::new(static_cast<void*>(ptr)) U;
  }
  template <typename U, typename...Args>
  void construct(U* ptr, Args&&... args) {
    a_t::construct(static_cast<A&>(*this),
                   ptr, std::forward<Args>(args)...);
  }
};

class transpositionTable
{
private:
    std::vector<ttCluster, default_init_allocator<ttCluster>> _table;
...
...

now the resize if faster (less than 1 second for 8GB) and all the elements of the board are 0 filled after a resize.

Thank you guys

1 Answers1

0

Why do you need the resize anyway? As I see it, the reserve call dose all you want. It allocates memory for the specific size. Since you want to initialize it later on,just use push_back to write data to it. Since no allocation has to be done, push_back has basically no performance penalty compared to only writing to a vector element.

sfs
  • 21
  • 4
  • because at runtime not push_back is used at all, only std::vector::operator[] is used at runtime. Max speed is required ar runtime and push_back is not an alternative – marco belli Dec 31 '19 at 10:14
  • 1
    The answer is valid. `reserve` reserves the required amount of memory. That's all you need. And you can use or `push_back` either `operator[]` later, while vector size is smaller than `_elements`, `push_back` does no memory allocations, which can be slow at runtime. – Nikita Smirnov Dec 31 '19 at 10:50
  • Using only reserve and [] operator will not change the size of the vector and this will break all other piece of code running in non speed critical section of code. The solution of using reserve and then accessing this area with [] simply break all the std::vector api – marco belli Dec 31 '19 at 11:10
  • The point being made is that using push_back on an vector that dose not have to allocate memory is not significantly slower than using operator[] on the vector, thus using only reserve and then push_back when actually initializing the vector is as fast as using resize and then accessing with operator[]. Difference being that the memory in the first case only gets written to once, and not twice as with resize (since it will fill with default values because RAII) – sfs Dec 31 '19 at 11:14
  • In my code I do not push back. It's an hash table, I don't push_back data, but write the data at index calculated by hash function. I need to allocate the table at startup and then accessing at random address later. – marco belli Dec 31 '19 at 11:27