I populate a very large array using a stxxl::VECTOR_GENERATOR<MyData>::result::bufwriter_type
(something like 100M entries) which I need to sort in parallel.
I use the stxxl::sort(vector->begin(), vector->end(), cmp(), memoryAmount)
method, which in theory should do what I need: sort the elements very efficiently.
However, during the execution of this method I noticed that only one processor is fully utilised, and all the other cores are quite idle (I suspect there is little activity to fetch the input, but in practice they don't do anything).
This is my question: is it possible to exploit more cores during the sorting phase, or is the parallelism used only to fetch the input asynchronously? If so, are there documents that explain how to enable it? (I looked extensively the documentation on the website, but I couldn't find anything).
Thanks very much!
EDIT
Thanks for the suggestion. I provide below some more information.
First of all I use MacOs for my experiments. What I do is that I launch the following program and I study its behaviour.
typedef struct Triple {
long t1, t2, t3;
Triple(long s, long p, long o) {
this->t1 = s;
this->t2 = p;
this->t3 = o;
}
Triple() {
t1 = t2 = t3 = 0;
}
} Triple;
const Triple minv(std::numeric_limits<long>::min(),
std::numeric_limits<long>::min(), std::numeric_limits<long>::min());
const Triple maxv(std::numeric_limits<long>::max(),
std::numeric_limits<long>::max(), std::numeric_limits<long>::max());
struct cmp: std::less<Triple> {
bool operator ()(const Triple& a, const Triple& b) const {
if (a.t1 < b.t1) {
return true;
} else if (a.t1 == b.t1) {
if (a.t2 < b.t2) {
return true;
} else if (a.t2 == b.t2) {
return a.t3 < b.t3;
}
}
return false;
}
Triple min_value() const {
return minv;
}
Triple max_value() const {
return maxv;
}
};
typedef stxxl::VECTOR_GENERATOR<Triple>::result vector_type;
int main(int argc, const char** argv) {
vector_type vector;
vector_type::bufwriter_type writer(vector);
for (int i = 0; i < 1000000000; ++i) {
if (i % 10000000 == 0)
std::cout << "Inserting element " << i << std::endl;
Triple t;
t.t1 = rand();
t.t2 = rand();
t.t3 = rand();
writer << t;
}
writer.finish();
//Sort the vector
stxxl::sort(vector.begin(), vector.end(), cmp(), 1024*1024*1024);
std::cout << vector.size() << std::endl;
}
Indeed there seems to be only one or maximum two threads working during the execution of this program. Notice that the machine has only a single disk.
Can you please confirm me whether the parallelism work on macos? If not, then I will try to use linux to see what happens. Or is perhaps because there is only one disk?