1

This is my first post on here, so go easy on me! I have a very strange problem. I've written a c code that converts particle data to grid data (the data comes from a cosmological simulation). In order to do this conversion, I am using the gsl Monte Carlo vegas integrator. When I run it in serial, it runs just fine and gives me the correct answer (albeit slowly). As an attempt at speeding it up, I tried openmp. The problem is that when I run it in parallel, the integration times out (I set a MAX_ITER variable in the integration loop to avoid an infinite loop due to lack of convergence). The random number generator is set and initialized before the parallel block of code. I checked and double checked, and all of the data about the particle that it fails at in parallel (x, y, and z position and integration bounds being passed to the integrator) are the same in both serial and parallel. I also tried increasing my MAX_ITER variable from 100 to 1000, but that did nothing; it just took longer to fail.

My question then, is if anyone has any idea why the code would run in serial, but time out in parallel when using the exact same particles?

Also, in case you want it, the numbers for the offending particle are: x = 0.630278, y = 24.952896, z = 3.256376, h = 3 (this is the smoothing length of the particle, which serves to "smear" out the particle's mass, as the goal of the simulation is to use particles to sample a fluid. This is the sph method), x integration bounds (lower, upper) = {0, 630278}, y bounds = {21.952896, 27.952896}, and z bounds = {0.256376, 6.256375}

The idea behind the conversion is that the particle's mass is contained within a "smoothing sphere" of radius h and centered at the particle itself. This mass is not distributed uniformly, but is done so according to the sph kernel (this is the function I'm integrating). Thus, depending on how the sphere is placed within its "home cell", only a part of the sphere may actually be inside this cell. The goal then is to get the appropriate bounds of integration and pass them to the integrator. The integrator (the code is below) has a check, whereby if the point given to it by the Monte Carlo integrator lies outside the sphere, it returns 0 (this is because getting the exact limits of integration for every possible case is a huge pain).

The code for my loop is here:

// Loop over every particle
#pragma omp parallel shared(M_P, m_header, NumPartTot, num_grid_elements, cxbounds,
cybounds, czbounds, master_cell) private(index, x, y, z, i, j, k, h, tid, cell,
corners) 
{
tid = omp_get_thread_num();

  // Set up cell struct. Allocate memory!
  cell = set_up_cell();

  #pragma omp for
  for(index = 1; index <= NumPartTot; index++)
  {    
     printf("\n\n\n************************************\n");
     printf("Running particle: %d on thread: %d\n", index, tid);
     printf("x = %f  y = %f   z = %f\n", M_P[index].Pos[0], M_P[index].Pos[1], M_P[index].Pos[2]);
     printf("**************************************\n\n\n");
     fflush(stdout);
     // Set up convenience variables
     x = M_P[index].Pos[0];
     y = M_P[index].Pos[1];
     z = M_P[index].Pos[2];

     // Figure out which cell the particle is in
     i = (int)((x / m_header.BoxSize) * num_grid_elements);
     j = (int)((y / m_header.BoxSize) * num_grid_elements);
     k = (int)((z / m_header.BoxSize) * num_grid_elements);

     corners = get_corners(i, j, k);

     // Check to see what type of particle we're dealing with
     if(M_P[index].Type == 0)
     {    
        h = M_P[index].hsml;
        convert_gas(i, j, k, x, y, z, h, index, cell, corners);
     }    

     else 
     {    
        update_cell_non_gas_properties(index, i, j, k, cell);
     }    
  }    

  // Copy each thread's version of cell to cell_master
  #ifdef _OPENMP
     copy_to_master_cell(cell);
     free_cell(cell);
  #endif
} /*-- End of parallel region --*/

The problem occurs in the function convert_gas. The problematic section is here (in the home cell block):

 // Case 6: Left face
if(((x + h) < cxbounds[i][j][k].hi) && ((x - h) < cxbounds[i][j][k].lo) &&
((y + h) < cybounds[i][j][k].hi) && ((y - h) >= cybounds[i][j][k].lo) &&
((z + h) < czbounds[i][j][k].hi) && ((z - h) >= czbounds[i][j][k].lo))
{
  printf("Using case 6\n");
  fflush(stdout);

  // Home cell
  ixbounds.lo = cxbounds[i][j][k].lo;
  ixbounds.hi = x + h;
  iybounds.lo = y - h;
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i, j, k);

  update_cell_gas_properties(kernel, i, j, k, index, cell);

  // Left cell
  ixbounds.lo = x - h;
  ixbounds.hi = cxbounds[i][j][k].lo;
  iybounds.lo = y - h; // Not actual bounds. See note above.
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i - 1, j, k);

  update_cell_gas_properties(kernel, i - 1, j, k, index, cell);

  return;

}

The data that I'm currently using is test data, so I know exactly where the particles should be and what integration bounds they should have. When using gdb, I find that all of these numbers are correct. The integration loop in the function integrate is here (TOLERANCE is 0.2, WARM_CALLS is 10000, and N_CALLS is 100000):

 gsl_monte_vegas_init(monte_state);
  // Warm up
  gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                            WARM_CALLS, random_generator, monte_state, &result, &error);

  // Actual integration
  do
  {
     gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                               N_CALLS, random_generator, monte_state, &result, &error);
     iter++;

  } while(fabs(gsl_monte_vegas_chisq(monte_state) - 1.0) > TOLERANCE && iter < MAX_ITER);



  if(iter >= MAX_ITER)
  {
     fprintf(stdout, "ERROR!!! Max iterations %d exceeded!!!\n"
             "See M_P[%d].id : %d   (%f %f %f)\n"
             "lower bnds : (%f %f %f)   upper bnds : (%f %f %f)\n"
              "trying to integrate in cell %d %d %d\n\n", MAX_ITER, pind, M_P[pind].id,
              M_P[pind].Pos[0], M_P[pind].Pos[1], M_P[pind].Pos[2],
              ixbounds.lo, iybounds.lo, izbounds.lo, ixbounds.hi, iybounds.hi, izbounds.hi, i, j, k);
     fflush(stdout);
     exit(1);
  }

Again, this exact code (but without openmp, I pass that compile time option as an option in the makefile) with the exact same numbers runs in serial, but not in parallel. I'm sure it's something stupid that I've done and simply cannot see at the moment (at least, I hope!) Anyways, thanks for the help in advance!

TimeFall
  • 41
  • 5
  • 1
    This is **a lot** of code and you've also provided many unnecessary details like values of variables that mean nothing to the readers here in that context. Please try to narrow down the problem and create a **minimal working example** (MWE) that is still able to reproduce the problem. In doing so it is often the case that you'll find the problem on your own, even before you post the MWE here. – Hristo Iliev Jul 01 '14 at 07:49
  • That said, I would double check that `monte_state` and the PRNG are not shared between the threads and that the GSL functions used are thread-safe. There are tools that one can use to automatically check for data races, e.g. Intel Inspector XE or Helgrind (using the latter with GCC requires that [the compiler is recompiled without futexes](http://stackoverflow.com/questions/10641972/helgrind-valgrind-and-openmp-c-avoiding-false-positives)). – Hristo Iliev Jul 01 '14 at 07:58
  • Awesome, thanks for the help! I'll definitely check out one of those tools. And yeah, sorry about the super long post, I just wasn't sure how much to add. Lesson learned. I'll work on an MWE. – TimeFall Jul 02 '14 at 14:50

0 Answers0