I have a very straightforward function that counts how many inner entries of an N
by N
2D matrix (represented by a pointer arr
) is below a certain threshold, and updates a counter below_threshold
that is passed by reference:
void count(float *arr, const int N, const float threshold, int &below_threshold) {
below_threshold = 0; // make sure it is reset
bool comparison;
float temp;
#pragma omp parallel for shared(arr, N, threshold) private(temp, comparison) reduction(+:below_threshold)
for (int i = 1; i < N-1; i++) // count only the inner N-2 rows
{
for (int j = 1; j < N-1; j++) // count only the inner N-2 columns
{
temp = *(arr + i*N + j);
comparison = (temp < threshold);
below_threshold += comparison;
}
}
}
When I do not use OpenMP, it runs fine (thus, the allocation and initialization were done correctly already).
When I use OpenMP with an N
that is less than around 40000, it runs fine.
However, once I start using a larger N
with OpenMP, it keeps giving me a segmentation fault (I am currently testing with N = 50000
and would like to eventually get it up to ~100000).
Is there something wrong with this at a software level?
P.S. The allocation was done dynamically ( float *arr = new float [N*N]
), and here is the code used to randomly initialize the entire matrix, which didn't have any issues with OpenMP with large N
:
void initialize(float *arr, const int N)
{
#pragma omp parallel for
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
*(arr + i*N + j) = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
}
}
}
UPDATE:
I have tried changing i
, j
, and N
to long long int
, and it still has not fixed my segmentation fault. If this was the issue, why has it already worked without OpenMP? It is only once I add #pragma omp ...
that it fails.