Weighted sampling in Fortran

Question

In a Fortran program I would like to choose at random a specific variable (specifically its index) by using weights. The weights would be provided in a separate vector (element 1 would contain weight of variable 1 and so on).

I have the following code who does the job without weight (mind being an integer vector with the index of each variable in the original dataset)

call rrand(xrand)
j = int(nn * xrand) + 1
mvar = mind(j)

roygvib · Accepted Answer · 2020-01-13T19:27:59.677

4

Here are two examples. The first one is

integer, parameter :: nn = 5
real :: weight( nn ), cumsum( nn ), x

weight( 1:nn ) = [ 1.0, 2.0, 5.0, 0.0, 2.0 ]

do j = 1, nn
    cumsum( j ) = sum( weight( 1:j ) ) / sum( weight( 1:nn ) )   !! cumulative sum
enddo

x = rand()
do j = 1, nn
    if ( x < cumsum( j ) ) exit
enddo

and the second one is taken from this page

real :: sum_weight
sum_weight = sum( weight( 1:nn ) )

x = rand() * sum_weight
do j = 1, nn
    if ( x < weight( j ) ) exit
    x = x - weight( j )
enddo

which is essentially the same as the first one. Both sample a random j from 1,2,...,5 with weight(j). 100000 trials give a distribution like

j     :    1           2           3           4       5
count :    10047       19879       50061       0       20013

EDIT: A minimal test code is attached below (tested with gfortran-8/9):

program main
    implicit none
    integer j, num( 5 ), loop
    real    weights( 5 )

    weights(:) = [ 1.0, 2.0, 5.0, 0.0, 2.0 ]
    num(:) = 0

    do loop = 1, 100000
        call random_index( j, weights )
        num( j ) = num( j ) + 1
    enddo

    do j = 1, size( weights )
        print *, j, num( j )
    enddo

contains

subroutine random_index( idx, weights )
    integer :: idx
    real, intent(in) :: weights(:)

    real x, wsum, prob

    wsum = sum( weights )

    call random_number( x )

    prob = 0
    do idx = 1, size( weights )
        prob = prob + weights( idx ) / wsum   !! 0 < prob < 1
        if ( x <= prob ) exit
    enddo
end subroutine

end program

edited Jan 13 '20 at 19:27

answered Sep 02 '15 at 17:00

roygvib

7,218
2
19
36

thank you roydvib! I assume that j after the exit of the loop keeps the index of the chosen variable. Right? – Chris Sep 03 '15 at 07:20
Yes, the value of j is kept after exiting the loop. So we can write a function that receives weight(:) as a dummy argument and returns j obtained from the DO loop, for example. – roygvib Sep 03 '15 at 09:10
@roygvib thank you but I am not sure I understand how it works. It seems I always get the same j when I run this. Would you mind giving more details? It seems everytime I run this rand() gives the same number. – Herman Toothrot Jan 07 '20 at 17:29
@HermanToothrot Hi, I have attached a minimal test / working code at the bottom of my answer. The code is slightly modified so that the subroutine `random_index()` can be used in a stand-alone manner. So could you try this code with your environment, to see whether it works as expected? (I almost forgot the details, but I guess we are computing the cumulative probability from the given weights and testing where 'x' falls in some "bin" (with a given weight).) – roygvib Jan 08 '20 at 20:19
@roygvib thanks, yes it's working, I didn't understand that it had to be in a loop. I have actually implemented it with the second example you provided. – Herman Toothrot Jan 09 '20 at 08:35
@roygvib, I might have found a bug in this code but I want to check with you. In the case in which x == weight(j), your second example fails and return an index which is outside the loop. So ' if ( x < weight( j ) ) exit' in your example should be ' if ( x <= weight( j ) ) exit' which is what I think you are using in your test code. – Herman Toothrot Sep 09 '20 at 19:40
@HermanToothrot It seems that the first and second code snippets (which I I think are based on a previous Q/A page) use "<", while the third snippet use "<=". I guess I was afraid about the corner case of x close to 1. Although x is guaranteed to be less than 1 ideally, x == 1.0 or even x > 1.0 (slightly above) can happen because of round-off error (I guess). So, I think it may be better to make the comparison for idx <= nn -1 (which can use either < or <=), and if the inequality is not met, then return nn as the result (i.e., we regard x to be in the last "weight" interval)... – roygvib Sep 14 '20 at 20:59
@roygvib I thought the problem was solved with just the use of x <= weight( j ), but code crash brought me back to reality :). So I think you are right that the loop should go to nn-1 so idx can be at most nn. I am not sure what you mean with " idx <= nn -1", I can't compile with do "idx <= 1, size( weights )-1", so I simply changed to do "idx = 1, size( weights ) -1". Should we change it in your answer? Also I am using example 2 so I wonder if this can happen also in example 1. – Herman Toothrot Oct 05 '20 at 11:21
@HermanToothrot I'm very sorry for late reply... (I was going to think about the code more carefully in weekend, but failed to do so because I got very forgettable recently...) And yes, if there is a possibly more robust code, I actually suggest you will add one more answer separately (as an improvement). Also, for more discussion, I believe one of [reddit](https://www.reddit.com/r/fortran/) or [Fortran Discourse](https://fortran-lang.discourse.group/) or [comp.lang.fortran](https://groups.google.com/g/comp.lang.fortran) sites will be very helpful to get more inputs/opinions :) – roygvib Nov 05 '20 at 01:02

Weighted sampling in Fortran

1 Answers1

Linked