In the code below, I have a parallel section where each thread uses a private
vector<int>
to push and pop integers. The problem I have is that as I increase the number of threads, the performance of each core decreases drastically, and the Kernel CPU usage (red bar in the htop command) assigned to each core increases a lot. For example, with 1 core I have 100% normal CPU usage, but with 25 cores (see image) almost all the CPU usage goes to the kernel.
It is probably something very basic, but I just don't know why it happens. I would expect that since each thread has its own private variable each core would work exactly the same no matter the number of total CPUS used in the parallel section.
Any advise?
int cpus = 25;
#pragma omp parallel for schedule(dynamic,1)
for (int ss = 0; ss < cpus; ss++)
{
std::vector<int> q;
while (true)
{
q.push_back(rand());
q.pop_back();
}
}