3

I've searched up and down looking for some help with this problem, tried various solutions, etc. but can't seem to trace the problem. I'm trying to parallelize a do loop that contains a call to an optimization routine in the NLopt library (for NLOpt see: http://ab-initio.mit.edu/wiki/index.php/NLopt). Here is a toy example of what I'm trying to do (the real problem is considerably more complex with additional calls to subroutines that update the parameters in calfun ):

program tester

implicit none

! Include optimization package
include 'nlopt.f'

integer :: i,t,obs,J,MAXI=1000
real(kind=8) :: TOL = 0.0000001

! Optimization variables
real(kind=8), allocatable :: lb(:), ub(:), args(:,:)
integer(kind=8), allocatable :: opt(:)
integer, allocatable :: ires(:),val(:)

print*, 'How many times would you like to evaluate the optimization problem?'
read*, obs
print*, 'What is the size of the input vector?'
read*, J

allocate(lb(J))
allocate(ub(J))
allocate(args(J,obs))
allocate(opt(obs))
allocate(ires(obs))
allocate(val(obs))

do i=1,J
    lb(i) = -5
    ub(i) = 5
end do

!$OMP PARALLEL DO SHARED(lb,ub,args,opt,ires,val) PRIVATE(t,i,J,TOL,MAXI) 

do t=1,obs

    ! Call optimization routine
    call nlo_create(opt(t),NLOPT_LN_BOBYQA,J)
    ! Set Bounds
    call nlo_set_lower_bounds(ires(t), opt(t), lb(1:J))
    call nlo_set_upper_bounds(ires(t), opt(t), ub(1:J))
    call nlo_set_max_objective(ires(t), opt(t), calfun, 0)
    ! Set initial values
    do i=1,J
        args(i,t) = 0
    end do
    ! Set tolerance and stopping criteria
    call nlo_set_xtol_abs1(ires(t), opt(t), TOL)
    call nlo_set_maxeval(ires(t), opt(t), MAXI)
    ! Call optimizer
    call nlo_optimize( ires(t), opt(t), args(1:J,t), val(t) )
    call nlo_destroy( opt(t) )
end do
!$OMP END PARALLEL DO

! Write argmax to working directory
open(unit=2, file="out.txt") 
write(2,*) args
close(2)

contains

! Function to be optimized
subroutine calfun(val, dims, args, grad, need_grad, f_data)

integer :: dims, need_grad
real(kind=8) :: val, args(dims), grad(dims)
real(kind=8) :: f_data
real(kind=8) :: sq_in =0
real(kind=8) :: lin_in = 1

! Example function has the form -2*(Sum(x(i)^2)) + Prod x(i)
if (need_grad == 0) then
    do i=1,dims
        sq_in = -2*args(i)**2 + sq_in
        lin_in = args(i)*lin_in
    end do
    val = sq_in + lin_in
end if

end subroutine calfun
end program tester

I've engaged in troubleshooting following this document:

https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors

I also tried to reset the stack size following this post:

Why Segmentation fault is happening in this openmp code?

The above compiles and runs correctly using gfortran as follows:

$ gfortran -I/usr/local/include/ tester.f90 /usr/local/lib/libnlopt.a 
$ ./a.out

It also compiles when I add the OpenMP flag:

$ gfortran -fopenmp -I/usr/local/include/ tester.f90 /usr/local/lib/libnlopt.a 

However, I get a segmentation error even after setting (running on Mac OSX)

$ ulimit -s 65532

Compiling with a backtrace only reveals tab errors and unused dummy arguments from the NLOpt package. I'm at a loss as to how to proceed and really need to parallelize this operation. Do I need to go into the NLOpt routine manually and set something using threadprivate? I can't seem to find good documentation on this. Appreciate any insights ...

(P.S.: This is my first Stackexchange post. I've been an avid reader over the years. Go easy on me!! Thanks!)

Community
  • 1
  • 1
  • 1
    I can't see anything obviously wrong apart from using unit `2` (small unit numbers are dangerous, use 100 and more) and using `kind-8` (works only in some compilers and does not always mean 8 bytes - not portable). But the NLOPT routines may not be thread-safe. – Vladimir F Героям слава Feb 27 '17 at 20:15
  • 2
    Especially the create and destroy routines sound like something that should not be done in parallel, just from their names. Are you sure NLOPT is supposed to work in parallel at all? – Vladimir F Героям слава Feb 27 '17 at 20:16
  • @VladimirF It would seem that since the calls to the NLOPT routines are data-independent, the problem resides in distributing the definitions of the NLOPT routines to all of the separate threads. From what I understand reading http://www.openmp.org/wp-content/uploads/OpenMP4.0.0.Examples.pdf the definitions of external modules are distributed to threads automatically by OpenMP. I'm wondering if this is a compilation issue with how I'm flagging the library and OpenMP? – Nicholas Pretnar Feb 27 '17 at 20:25
  • There are many things that can be not thread safe in theblibrary. The library may call some system routines, store something in global variables (which are shared). Perhaps threadprivate would indeed fix something, but it would have to be inside that library. – Vladimir F Героям слава Feb 27 '17 at 20:29
  • By default you cannot expect some random library call to be thread-safe. – Vladimir F Героям слава Feb 27 '17 at 20:29
  • 4
    Is this line intended to be with the SAVE attribute (not initialization like C)? "real(kind=8) :: sq_in = 0" Also, is it really OK to make J, TOL, MAXI as private...? (may become undefined upon fork??) Further, the variable "i" used in calfun() should probably be defined locally within calfun() (rather than via host association)...? – roygvib Feb 27 '17 at 21:08
  • 1
    Shouldn't at least `J`, `TOL` and `MAXI` be declared `firstprivate` rather than just `private`, because at the moment, their values are lost upon entry on the `parallel` region. Furthermore, depending on the expected action of the various NLOPT routines called, they could even be just left `shared`... – Gilles Feb 28 '17 at 01:40
  • Or, again depending on how the NLOPT routines work, tol and maxi could even be parameters and therefore not need to be scoped at all. Also stylistically its slight strange that you scope all your variables in the parallel region but don't have default(none). And finally real(8)? JUST SAY NO. Use an appropriately set symbolic constant, see stackoverflow passim – Ian Bush Feb 28 '17 at 07:16
  • Thanks for all of your helpful comments, everyone. I'm working on cleaning this up and trying an implementation of a lock on the call to NLOPT, but I need to read up on the details of scoped vs. critical locking in OpenMP. – Nicholas Pretnar Feb 28 '17 at 12:59
  • `J` should definitely be `shared` or at least `firstprivate`. The code as it is now is very likely destroying the heap and/or writing in non-allocated memory in the loop over `args(i,t)`. – Hristo Iliev Feb 28 '17 at 15:42

0 Answers0