Why is my Fortran code wrapped with f2py using so much memory?

Question

I am trying to calculate all the distances between approximately a hundred thousand points. I have the following code written in Fortran and compiled using f2py:

C         1         2         3         4         5         6         7
C123456789012345678901234567890123456789012345678901234567890123456789012
        subroutine distances(coor,dist,n)
        double precision coor(n,3),dist(n,n)
        integer n
        double precision x1,y1,z1,x2,y2,z2,diff2

cf2py intent(in) :: coor,dist
cf2py intent(in,out):: dist
cf2py intent(hide)::n
cf2py intent(hide)::x1,y1,z1,x2,y2,z2,diff2

        do 200,i=1,n-1
            x1=coor(i,1)
            y1=coor(i,2)
            z1=coor(i,3)
            do 100,j=i+1,n
                x2=coor(j,1)
                y2=coor(j,2)
                z2=coor(j,3)
                diff2=(x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)+(z1-z2)*(z1-z2)
                dist(i,j)=sqrt(diff2)
  100       continue
  200   continue

        end

I am compiling the fortran code using the following python code setup_collision.py:

# System imports
from distutils.core import *
from distutils      import sysconfig

# Third-party modules
import numpy
from numpy.distutils.core import Extension, setup

# Obtain the numpy include directory.  This logic works across numpy versions.
try:
    numpy_include = numpy.get_include()
except AttributeError:
    numpy_include = numpy.get_numpy_include()

# simple extension module
collision = Extension(name="collision",sources=['./collision.f'],
               include_dirs = [numpy_include],
               )

# NumyTypemapTests setup
setup(  name        = "COLLISION",
        description = "Module calculates collision energies",
        author      = "Stvn66",
        version     = "0.1",
        ext_modules = [collision]
        )

Then running it as follows:

import numpy as np
import collision

coor = np.loadtxt('coordinates.txt')
n_atoms = len(coor)
dist = np.zeros((n_atoms, n_atoms), dtype=np.float16) # float16 reduces memory
n_dist = n_atoms*(n_atoms-1)/2
n_GB = n_dist * 2 / float(2**30) # 1 kB = 1024 B
n_Gb = n_dist * 2 / 1E9          # 1 kB = 1000 B
print 'calculating %d distances between %d atoms' % (n_dist, n_atoms)
print 'should use between %f and %f GB of memory' % (n_GB, n_Gb)
dist = collision.distances(coor, dist)

Using this code with 30,000 atoms, what should use around 1 GB of memory to store the distances, it instead uses 10 GB. With this difference, performing this calculation with 100,000 atoms will require 100 GB instead of 10 GB. I only have 20 GB in my computer.

Am I missing something related to passing the data between Python and Fortran? The huge difference indicates a major flaw in the implementation.

Probably an issue with data getting shuffled across the boundary between numpy and fortran? I don't know enough about either to speculate, though. — Patrick Collins, May 12 '15 at 18:12
Do you really need the full NxN interaction? You could probably use a partitioning scheme (grid, BSP, oct-tree, etc. ) to speed things up... — Alexander Vogt, May 12 '15 at 18:35
I need to know the closest distances. I need to identify if any atoms get within a certain distance of each other, i.e. collide (I have a similar routine that breaks immediately if that threshold is crossed). — Steven C. Howell, May 12 '15 at 19:10
Take a look into Ericson's book [Real-time collision detection](http://realtimecollisiondetection.net/). You'll find several algorithms that tremendously reduce the complexity (you have (On^2)!) — Alexander Vogt, May 12 '15 at 19:15

Alexander Vogt · Accepted Answer · 2015-05-12T19:03:47.670

7

You are feeding double precision arrays to the Fortran subroutine. Each element in double precision requires 8 Byte of memory. For N=30,000 that makes

coor(n,3) => 30,000*3*8 ~ 0.7 MB
dist(n,n) => 30,000^2*8 ~ 6.7 GB

Since the half precision floats are additionally required for Python, that accounts for another 1-2GB. So the overall requirement is 9-10GB.

The same holds true for N=100,000, which will require ~75GB for the Fortran part alone.

Instead of double precision floats, you should use single precision reals - if that is sufficient for your calculations. This will lead to half the memory requirements. [I have no experience with that, but I assume that if both parts use the same precision, Python can operate on the data directly...]

As @VladimirF noted in his comment, "usual compilers do not support 2 byte reals". I checked with gfortran and ifort, and they both do not. So you need to use at least single precision.

edited May 12 '15 at 19:03

answered May 12 '15 at 18:18

Alexander Vogt

17,879
13
52
68

3

I think usual compilers do not support 2 byte reals. selected_real_kind will return the default one. – Vladimir F Героям слава May 12 '15 at 18:55
The code I posted used half precision in python but not in fortran (missed that). I will try changing that to half, or at least single. – Steven C. Howell May 12 '15 at 20:45
Is there a way to have Python and Fortran use the same memory blocks to avoid doubling the requirement? – Steven C. Howell May 12 '15 at 21:13
1

@stvn66: Yes, you can use `ctypes` for that task. See my answer [over here](https://stackoverflow.com/questions/19263879/speeding-up-element-wise-array-multiplication-in-python/19458585#19458585). – Alexander Vogt May 13 '15 at 08:08
1

@stvn66 If you create your numpy arrays as `order='F'` (so they're stored as Fortran would expect) f2py shouldn't copy them. I'm not sure how that interacts with `intent(in,out)` so you might need to experiment with that to see. – DavidW May 15 '15 at 09:01

Why is my Fortran code wrapped with f2py using so much memory?

1 Answers1