I have a program where I need to sort a large number of large numeric distributions. To reduce the time it takes to do this, I am trying to multi-thread this.
I wrote a small, simple abstraction of my program to try and isolate the issue. I believe I am encountering a stack overflow, or hitting the operating system's stack limit because my test program mirrors the segmentation fault issue when:
- The distributions are all the same value (meaning qsort will run like crap)
- Threading is enabled.
meow
#include <boost/thread/thread.hpp>
#include <vector>
#include <stdlib.h> // for rand()
void swapvals(double *distribution, const size_t &d1, const size_t &d2)
{
double temp = 0;
temp = distribution[d2];
distribution[d2] = distribution[d1];
distribution[d1] = temp;
//std::swap(distribution[d1], distribution[d2]);
}
size_t partition(double *distribution, size_t left, size_t right)
{
const double pivot = distribution[right];
while (left < right) {
while ((left < right) && distribution[left] <= pivot)
left++;
while ((left < right) && distribution[right] > pivot)
right--;
if (left < right)
{
swapvals(distribution, left, right);
}
}
return right;
}
void quickSort(double *distribution, const size_t left, const size_t right)
{
if (left >= right) {
return;
}
size_t part = partition(distribution, left, right);
quickSort(distribution, left, part - 1);
quickSort(distribution, part + 1, right);
}
void processDistribution(double *distributions, const size_t distribution_size)
{
std::clog << "beginning qsorting." << std::endl;
quickSort(distributions, 0, distribution_size - 1);
std::clog << "done qsorting." << std::endl;
}
int main(int argc, char* argv[])
{
size_t distribution_size = 65000;
size_t num_distributions = 10;
std::vector<double *> distributions;
// Create num_distributions distributions.
for (int i = 0; i < num_distributions; i++)
{
double * new_dist = new double[distribution_size];
for (int k = 0; k < distribution_size; k++)
{
// Works when I have actual numbers in the distributions.
// Seg faults when all the numbers are the same.
new_dist[k] =1;
//new_dist[k] = rand() % 1000 + 1; // uncomment this, and it works.
}
distributions.push_back(new_dist);
}
// Submit each distribution to a quicksort thread.
boost::thread_group threads;
for (std::vector<double *>::const_iterator it=distributions.begin(); it != distributions.end(); ++it)
{
// It works when I run processDistribution directly. Segfaults when I run it via threads.
//processDistribution(*it, distribution_size);
threads.create_thread(boost::bind(&processDistribution, *it, distribution_size));
}
threads.join_all();
// Show the results of the sort for all the distributions.
for (std::vector<double *>::const_iterator it=distributions.begin(); it != distributions.end(); ++it)
{
for (size_t i = 0; i < distribution_size; i++)
{
// print first and last 20 results.
if (i < 20 || i > (distribution_size - 20))
std::cout << (*it)[i] << ",";
}
std::cout << std::endl;
}
}
GDB analysis of the core file yields:
Error in re-setting breakpoint -1: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -1: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -2: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -3: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Core was generated by `testthreads'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18
warning: Source file is more recent than executable.
18
(gdb) bt 7
#0 0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18
#1 0x0000000100005834 in quickSort (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:42
#2 0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63627) at testthreads.cpp:43
#3 0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63628) at testthreads.cpp:43
#4 0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63629) at testthreads.cpp:43
#5 0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63630) at testthreads.cpp:43
#6 0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63631) at testthreads.cpp:43
(More stack frames follow...)
(gdb) frame 0
#0 0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18
18
(gdb) info locals
pivot = 1
(gdb) info args
distribution = 0x1101d1430
left = 0
right = 63626
(gdb)
Also, my actual program deals with many more threads and distributions. And the GDB inspections there often show far more bizarre stack traces that look like memory corruption (notice how swapVals is called with d1 = 12119, but inside the partition stack frame it's coming through as 4568618016):
(gdb) bt 3
#0 0x00000001002aa0b8 in ScenRankReplacer<double>::swapvals (this=0xfffffffffffdfc8, distribution=..., d1=@0x1104c8178: 4568618016, d2=@0x1104c8140: 4568416720, ranking_values=0x1104c81d0,
r1=@0x1104c8170: 1152921504606838728, r2=@0x1002a16c8: 6917529029728344952) at ScenRankReplacer.h:96
#1 0x00000001002a7120 in ScenRankReplacer<double>::partition (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:122
#2 0x00000001002a16c8 in ScenRankReplacer<double>::quickSort (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:91
(More stack frames follow...)
(gdb) frame 1
#1 0x00000001002a7120 in ScenRankReplacer<double>::partition (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:122
122 swapvals(distribution, mid, left, ranking_values, mid - 1, left - 1);
(gdb) p mid
$1 = 12119
(gdb) p left
$2 = 1
So...my questions:
- Am I correct? Am I hitting a stack limit?
- How on earth do I ascertain that this is the case (other than the deduction i've done above)? Is there an easy way to detect these? a GDB clue or something?
- Why does the threading matter? Do all the threads share the same stack limit?
- Most important: How do I get this working?! Is a recursive quicksort on massive datasets just not feasable?
The error occurs with compilation level O2. Thread model: aix gcc version 4.8.3 (GCC)