I am trying to make an efficient Python code that, given a set of data points in the form
a1 a2 a3 ... an (point 1)
b1 b2 b3 ... bn (point 2)
.
.
.
will find which of these are too close (within a threshold value).
I already have the following Fortran routine that solves this problem:
program check_closeness
implicit none
integer, parameter :: npoints = 10000, ndis = 20
real(8), parameter :: r = 1e0, maxv = 3e0, minv = 0e0
real(8), dimension(npoints,ndis) :: dis
real(8) :: start, ende
call RANDOM_NUMBER(dis)
dis = (maxv - minv) * dis + minv
call cpu_time(start)
call remove_close(npoints, ndis,dis, r)
call cpu_time(ende)
write(*,*) 'Time elapsed', ende - start
endprogram
subroutine remove_close(npoints, ndis, points, r)
implicit none
integer, intent(in) :: npoints, ndis
integer :: i, i_check
real(8), dimension(npoints, ndis), intent(in) :: points
real(8), intent(in) :: r
logical :: is_close
do i=1,npoints
is_close = .FALSE.
i_check = i
do while (.not. is_close .and. i_check < npoints)
i_check = i_check + 1
is_close = all(abs(points(i,:) - points(i_check,:)) < r)
enddo
enddo
end subroutine
This execution takes in my computer:
Time elapsed 1.0651889999999999
(seconds).
Now, I write the same exact code (or so I think) in Python:
import numpy as np
import time
def check_close(a, r):
npoints = a.shape[0]
for i, vec1 in enumerate(a):
is_close = False
icheck = i
while (not is_close and icheck < npoints - 1):
icheck +=1
vec2 = a[icheck,:]
is_close = all(np.abs(vec1 - vec2) < r)
maxv, minv = 3, 0
a = np.random.rand(10000, 20)
a = (maxv - minv) * a + minv
r = 1e0
ini = time.time()
check_close(a, r)
fin = time.time()
print('Time elapsed {}'.format(fin - ini))
This execution takes Time elapsed 102.60617995262146
(seconds), which is considerably slower than the Fortran one.
I tried yet another Python version of this routine which is considerably faster, but still does not approach the Fortran one:
import numpy as np
import time
def check_close(a, r):
for i, vec1 in enumerate(a[:-1]):
d = np.abs(a[i+1:,:] - vec1)
is_close = np.any(np.all(d < r, axis=1))
maxv, minv = 3, 0
a = np.random.rand(10000, 20)
a = (maxv - minv) * a + minv
r = 1e0
ini = time.time()
check_close(a, r)
fin = time.time()
print('Time elapsed {}'.format(fin - ini))
In this case the execution takes Time elapsed 3.4987785816192627
(seconds). From this, I guess the improvement comes from the vectorization achieved and removing the while loop. On the other hand, this implementation does not benefit from stopping the search as one close point is found.
My questions are:
- What is it that takes so much time in the Python implementations?
- Is there any way I could rewrite the Python code to make if (almost) as fast as the Fortran one?
[Edit]
I was asked to use time.perf_counter() to measure executions times in Python codes.
Now I use:
ini = time.perf_counter()
check_close(a, r)
fin = time.perf_counter()
print('Time elapsed {}'.format(fin - ini))
to measure them and the updated times are:
Python implementation 1: Time elapsed 98.63728801719844 (seconds)
Python implementation 2: Time elapsed 3.3923211600631475 (seconds)