I had a piece of code, which looked like this,
for(i=0;i<NumberOfSteps;i++)
{
for(k=0;k<NumOfNodes;k++)
{
mark[crawler[k]]++;
r = rand() % node_info[crawler[k]].num_of_nodes;
crawler[k] = (int)DataBlock[node_info[crawler[k]].index+r][0];
}
}
I changed it so that the load can be split among multiple threads. Now it looks like this,
for(i=0;i<NumberOfSteps;i++)
{
for(k=0;k<NumOfNodes;k++)
{
pthread_mutex_lock( &mutex1 );
mark[crawler[k]]++;
pthread_mutex_unlock( &mutex1 );
pthread_mutex_lock( &mutex1 );
r = rand() % node_info[crawler[k]].num_of_nodes;
pthread_mutex_unlock( &mutex1 );
pthread_mutex_lock( &mutex1 );
crawler[k] = (int)DataBlock[node_info[crawler[k]].index+r][0];
pthread_mutex_unlock( &mutex1 );
}
}
I need the mutexes to protect shared variables. It turns out that my parallel code is slower. But why ? Is it because of the mutexes ?
Could this possibly be something to do with the cacheline size ?