0

I'm having troubles pinpointing the exact source of either a race condition or memory corruption. My attempts to solve the problem are shown after the code.

I have the following structure:

class A
{
protected:
   // various variables
   // 1. vector that is assigned value on B, C, D constructor and not 
   //   modified while in thread
   // 2. various ints
   // 3. double array that is accessed by B, C, D
   // here that are used by B, C and D
public:
   virtual void execute() = 0;
};

class B : A
{
public:
   B(...){};
   bool isFinished();
   void execute(); //execute does a very expensive loop (genetic algorithm)
}

class C : A
{
public:
   C(...){};
   bool isFinished();
   void execute();
}

class D : A
{
public:
   D(...){};
   bool isFinished();
   void execute();
}

class Worker
{
private:
   A& m_a;
   Container& m_parent;
public:
   // Worker needs a reference to parent container to control a mutex 
   // in the sync version of this code (not shown here)
   Worker(A& aa, Container& parent) : m_a(aa), m_parent(parent) {}
   executeAsynchronous();
}

class Container
{
private:
   std::vector<Worker> wVec;
public:
   addWorker(Worker w); //this does wVec.push_back(w)
   start();
}

void Worker::executeAsynchronous(){
    while(!a.isFinished())
        m_a.execute();
}

void Container::start(){
    std::thread threads[3];
    for (int i=0; i<wVec.size(); i++){
        threads[i] = std::thread(&Worker::executeAsynchronous,
                                                std::ref(wVec[i]));
    }
    for (int i=0; i<wVec.size(); i++){
        threads[i].join();
    }
}

To run the code, i'd do:

Container container;
B b(...);
C c(...);
D d(...);
Worker worker1(b, container);
Worker worker2(c, container);
Worker worker3(d, container);
container.addWorker(worker1);
container.addWorker(worker2);
container.addWorker(worker3);
container.start();

The code is supposed to spawn threads to run execute() asynchronously however I have the following 2 problems:

  1. One thread is faster than 2 or 3 or 4 threads AND has better results (better optimisation resulting from running the genetic algorithm in 1 thread), I have read that I could be limited by memory bandwidth but where is that happening? how can I verify that this is the case?

  2. Two or more threads: the results become very bad, somehow something is getting corrupted or mangled along the way. However I cannot pinpoint it. I have couted from various locations in the code and each threads executes exactly one inherited class's execute() i.e., each threads runs the execute() of B, C or D and doesn't jump or interfere with others. The moment I put m_parent.mutex.lock() and m_parent.mutex.unlock() around a.execute(); effectively making the multi-threaded code single-threaded the results become correct again.

I have attempted to:

  1. remove pointers in B, C and D that could become dangling after pushing the Workers back into the Container's vector. I now pass a copy to push_back.
  2. use emplace_back instead of push_back but it made no difference
  3. use vector.reserve() to avoid reallocation and loss of reference but no difference
  4. use std::ref() because I discovered std::thread makes a copy and I want the element wVec[i] to be modified, previously i was just passing wVec[i] to the thread.

I believe by doing 1-4 above and they made no difference and by running the code single-threaded and it works perfectly that it isn't a case of something going out of scope. Also there is no data exchange between threads or the container, I know std::vector isn't thread-safe.

I'd appreciate if you'd take the time to help me figure this out.

EDIT1: As per Constantin Pan's notice, here is my RandomNumberGenerator class, it is a static class, i call it using RandomNumberGenerator::getDouble(a,b)

//rng.h
class RandomNumberGenerator
{
private:
    static std::mt19937 rng;

public:
    static void initRNG();
    static int getInt(int min, int max);
    static double getDouble(double min, double max);
};

//rng.cpp
std::mt19937 RandomNumberGenerator::rng;

void RandomNumberGenerator::initRNG()
{
    rng.seed(std::random_device()());
}

int RandomNumberGenerator::getInt(int min, int max)
{
    std::uniform_int_distribution<std::mt19937::result_type> udist(min, max);
    return udist(rng);
}

double RandomNumberGenerator::getDouble(double min, double max)
{
    std::uniform_real_distribution<> udist(min, max);
    return udist(rng);
}

EDIT2: I have solved the corruption problem. It was a call to a non-thread safe function that I have missed (the evaluation function). As for the slowness, the program is still slow when ran in the threads. I have ran valgrind's callgrind and graphed the results using gprof2dot and it appears M4rc's suggestion holds. There are a lot of STL container calls, I will attempt to dynamic allocate arrays instead.

EDIT3: Looks like the RNG class was the culprit as Constantin Pan pointed out. Profiled using gprof

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 17.97     70.09    70.09  1734468     0.00     0.00  std::mersenne_twister_engine //SYNC
 18.33     64.98    64.98  1803194     0.00     0.00  std::mersenne_twister_engine //ASYNC
  6.19     63.41     8.93  1185214     0.00     0.00  std::mersenne_twister_engine //Single thread

EDIT4: Deque container was guilty too - M4rc

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
14.15     28.60    28.60 799662660     0.00     0.00  std::_Deque_iterator
kamala11
  • 3
  • 2
  • 1
    If you want help figuring out why your code doesn't work, you have to actually post the code that doesn't work. [mcve] This is especially important in multithreaded code. Reduce your application until it is as small as you can make it and still retain the error and if you still need help, post that code. – xaxxon Jun 26 '17 at 23:08
  • A profiler might explain the cause of the slowdown, but my guess would be the stl allocator is the culprit there. I've had a similar issue, but with significantly more threads. Just for grins, perhaps dynamically allocate pointers to anything you push back instead of pushing the objects back themselves -- just to see if that has any bearing on the corruption issue. – M4rc Jun 27 '17 at 02:14
  • @M4rc I have updated the question (Edit2) with my findings so far, I will attempt your suggestion. – kamala11 Jun 27 '17 at 08:31

1 Answers1

0

Sice there is genetic algorithm involved, make sure that the random number generator is thread-safe. I have hit this (the slowdown and incorrect results) myself in the past with rand() from cstdlib.

Constantin S. Pan
  • 1,362
  • 11
  • 4
  • great notice, I have edited and added my rng class, it is a static class which I believe should be thread-safe – kamala11 Jun 26 '17 at 23:21