1

I have a multithread application, it uses threadpool so there are 10 threads running the same run() function as below:

run(){
...
SetTileAt(p, tile);
...
ClearPointsNotIn(list);
...
}

void TileMatrix::ClearPointsNotIn(QList<Point>list)
{
    removals.clear();
    mutex.lock();
    foreach(Point p, matrix.keys())
    {
        if(!list.contains(p))
        {
            removals.append(p);
        }
    }
    mutex.unlock();
    foreach(Point p,removals)
    {
        Tile* t=TileAt(p);
        if(t!=0)
        {
            mutex.lock();
            delete t;
            t=0;
            matrix.remove(p);
            mutex.unlock();
        }

    }
    removals.clear();
}

void TileMatrix::SetTileAt(const Point &p, Tile* tile)
{
     mutex.lock();
     Tile* t=matrix.value(p,0);
     if(t!=0)
        delete t;
     matrix.insert(p,tile);
     mutex.unlock();
}

Tile* TileMatrix::TileAt(const Point &p)
{
 Tile* ret;
 mutex.lock();
 ret=matrix.value(p,0);
 mutex.unlock();
 return ret;
}

And when I run the application, it some times crashed at the delete t part, I have checked the t's value at that moment it seems though t!=0, but the pointed content is totally garbage. I naively guess this is a "delete a deleted pointer problem". But I am not quite sure how is this happening and how could I modify the code to prevent it? Note that the muetex in TileAt with the one in ClearPointsNotIn() could create a dead lock...

Nyaruko
  • 4,329
  • 9
  • 54
  • 105

2 Answers2

2

This is what could potentially happen and it seems it happens from time to time:

There is a good probability that just after you get the pointer t in TileMatrix::ClearPointsNotIn, another thread to lock at mutex.lock(); inside TileMatrix::SetTileAt function. At that moment, matrix.value(p,0); could return the exact same pointer as TileAt(p); returned in the thread controlling TileMatrix::ClearPointsNotIn before. Then you delete t in TileMatrix::SetTileAt and unlock. In the meantime in the thread running the TileMatrix::ClearPointsNotIn function you have an already deleted pointer in t(because you deleted it in the other thread) and when you call delete on it the application crashes.

This is called a race condition.

What i recommend doing is move the mutex.lock() statement from TileMatrix::ClearPointsNotIn after foreach and just before Tile* t=TileAt(p);. What I also recommend doing is after you delete a pointer, also assign 0 or NULL to it. So that your if(t!=0) will prevent the execution thread execute the block inside the if if the pointer was set to 0 or NULL before. Read here and here for more details on that.

If it wasn't clear enough let me know and I can provide more details.

Community
  • 1
  • 1
Iuliu
  • 4,001
  • 19
  • 31
  • A feedback would be great! – Iuliu Oct 22 '14 at 18:15
  • I have modified it but it creates a dead lock... I have found the reason because I forget that in TileAt there's also a pair of mutex... Could you give more suggestions in this case – Nyaruko Oct 22 '14 at 19:36
  • Unless there's critical code in other place there is no need for a new pair of mutexes in `TileAt`. I would suggest you remove it or post the whole example if you think there's need for them because of other code we don't know of. Based just on the code you've posted removing the mutex pair from `TileAt` should solve the problem. – Iuliu Oct 22 '14 at 20:17
1

As liuliu has mentioned, you have a race condition due to insufficient protection of TileAt by the mutex.

The ClearPointsNotIn should be protected by the mutex in its entirety. The point list should also be passed by const reference, not by value. And you should be using the RAII mutex locker. It could look as follows:

void TileMatrix::ClearPointsNotIn(const QList<Point> & list)
{
    removals.clear();
    QMutexLocker lock(&mutex);
    foreach(Point p, matrix.keys())
    {
        if (!list.contains(p)) removals << p;
    }
    foreach(Point p, removals)
    {
        Tile* t = TileAt(p);
        delete t;
        if (t) matrix.remove(p);
    }
}

Furthermore, assuming that the deletion of a tile doesn't have side effects that need the mutex to be held, you could refactor to remove tile deletion from under the mutex:

class TileMatrix {
  ...
  QList<Tile*> deadTiles;
  ...
};

// Variant 1
void TileMatrix::ClearPointsNotIn(const QList<Point> & list)
{
    QMutexLocker lock(&mutex);
    deadTiles.clear();
    foreach(Point p, matrix.keys())
    {
        if (list.contains(p)) continue;
        Tile* tile = TileAt(p);
        if (tile) { 
          deadTiles << tile;
          matrix.remove(p);
        }
    }
    lock.unlock();
    foreach(Tile* tile, deadTiles) delete tile;
}

// Variant 2
void TileMatrix::ClearPointsNotIn(const QList<Point> & list)
{
    QMutexLocker lock(&mutex);
    foreach(Point p, matrix.keys())
    {
        if (list.contains(p)) continue;
        Tile* tile = TileAt(p);
        if (!tile) continue;
        matrix.remove(p);
        delete tile;
    }
}

You should profile whether variant 1 is indeed faster than variant 2.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • I have modified it but it creates a dead lock... I have found the reason because I forget that in TileAt there's also a pair of mutex... Could you give more suggestions in this case – Nyaruko Oct 22 '14 at 19:12
  • @Nyaruko What mutex is locked within TileAt? The same one that is locked in `ClearPointsNotIn`? – Kuba hasn't forgotten Monica Oct 22 '14 at 20:20