7

I have a binary file that was created using a Python code. This code mainly scripts a bunch of tasks to pre-process a set of data files. I would now like to read this binary file in Fortran. The content of the binary file is coordinates of points in a simple format e.g.: number of points, x0, y0, z0, x1, y1, z1, ....

These binary files were created using the 'tofile' function in numpy. I have the following code in Fortran so far:

integer:: intValue
double precision:: dblValue
integer:: counter
integer:: check
open(unit=10, file='file.bin', form='unformatted', status='old', access='stream')

counter = 1

do 

  if ( counter == 1 ) then
    read(unit=10, iostat=check) intValue
    if ( check < 0 ) then
      print*,"End Of File"
      stop
    else if ( check > 0 ) then
      print*, "Error Detected"
      stop
    else if ( check == 0 ) then
      counter = counter + 1
      print*, intValue
    end if
  else if ( counter > 1 ) then
    read(unit=10, iostat=check) dblValue
    if ( check < 0 ) then
      print*,"End Of File"
      stop
    else if ( check > 0 ) then
      print*, "Error Detected"
      stop
    else if ( check == 0 ) then
      counter = counter + 1
      print*,dblValue
    end if
  end if

end do

close(unit=10)

This unfortunately does not work, and I get garbage numbers (e.g 6.4731191026611484E+212, 2.2844499004808491E-279 etc.). Could someone give some pointers on how to do this correctly? Also what would be a good way of writing and reading binary files interchangeably between Python and Fortran - as it seems like that is going to be one of the requirements of my application.

Thanks

computanjohn
  • 133
  • 8
  • 1
    I added a nice and detailed answer, telling you to use `access=stream`:) I just realized that you're already doing that, so I deleted my answer for now. So, question: are you sure that the byte size of your python `int` and your fortran `integer` are the same? You should check both. If there is a single byte of discrepancy, the misalignment of the data will lead to garbage after `read`ing. What fortran compiler are you using? How are you declaring your `integer`s? What is the scpecific type of your python `int`s? – Andras Deak -- Слава Україні Oct 27 '15 at 01:45
  • If you're really desperate, you can try generating the (supposedly) same dummy binary file both with fortran and python, then looking at the hex dump of the two files to see what's up. Also, my earlier question about the `integer` sizes obviously also applies to the `doubles` involved. And even if the types check out, there can still be an endianness-problem if you're using the two codes on two very different machines. – Andras Deak -- Слава Україні Oct 27 '15 at 02:19
  • 1
    Regarding the question of interchangeability: I'd rather put metadata like number of points into a separate ASCII header file, which can be easily read, and only put data of the same kind into a single binary file, this also allows for a fairly easy conversion of endianness. – haraldkl Oct 27 '15 at 02:51
  • show your declarations – agentp Oct 27 '15 at 10:52
  • 1
    for portability and interchangeability, i'd use HDF5 to read/write data. supported on python and Fortran and many others. – casey Oct 27 '15 at 17:34
  • What kind of arrays are x0, y0, etc. Numpy arrays contain values of a specific type (e.g. `float32`) and that information is going to be required to know how to read the file. That or the actual file and a description of the values contained within. – casey Oct 27 '15 at 17:51
  • Is the binary file being written and read on the same computer? If not, you might have an endian issue. Otherwise, as the other answers suggest, make sure that your numeric types match and possibly check with a hexeditor. – M. S. B. Oct 27 '15 at 20:01
  • As far as I know, tofile in python saves one array to a file, in your case I suppose it is a double array, and you are reading first an integer value and then double values. – credondo Oct 28 '15 at 14:46
  • I just edited in the declarations as well. I will try writing in just one array, and reading it back in. Do I need to then declare the doubles in Fortran as real(kind=32) then to match numpy precision? – computanjohn Oct 28 '15 at 17:41
  • 1
    Kind numbers are generally non-portable, but `kind=32` is invalid for all compilers I know. Use `kind=real32` and `kind=real64` (the constants come from the module `iso_fortran_env` https://gcc.gnu.org/onlinedocs/gfortran/ISO_005fFORTRAN_005fENV.html). They should be equivalent to NumPy float32 and float64. – Vladimir F Героям слава Oct 28 '15 at 17:45
  • I believe you should also show how you save the file in Python. – Vladimir F Героям слава Oct 28 '15 at 17:48
  • Not sure if it helps in this instance, but if you have a particular format in mind, on the python side, struct provides a mechanism for you to convert data around with controls for things like endian-ness – Foon Oct 28 '15 at 17:51
  • How about using C_types on both sides. When using a fortran library in Pythos, I make sure to use a C_double for example on both sides. – electrique Oct 31 '15 at 00:08

2 Answers2

1

Here's a trivial example of how to take data generated with numpy to Fortran the binary way.

I calculated 360 values of sin on [0,2π),

#!/usr/bin/env python3
import numpy as np

with open('sin.dat', 'wb') as outfile:
    np.sin(np.arange(0., 2*np.pi, np.pi/180.,
                     dtype=np.float32)).tofile(outfile)

exported that with tofile to binary file 'sin.dat', which has a size of 1440 bytes (360 * sizeof(float32)), read that file with this Fortran95 (gfortran -O3 -Wall -pedantic) program which outputs 1. - (val**2 + cos(x)**2) for x in [0,2π),

program numpy_import
    integer,         parameter                    :: REAL_KIND = 4
    integer,         parameter                    :: UNIT = 10
    integer,         parameter                    :: SAMPLE_LENGTH = 360
    real(REAL_KIND), parameter                    :: PI = acos(-1.)
    real(REAL_KIND), parameter                    :: DPHI = PI/180.

    real(REAL_KIND), dimension(0:SAMPLE_LENGTH-1) :: arr
    real(REAL_KIND)                               :: r
    integer                                       :: i


    open(UNIT, file="sin.dat", form='unformatted',&
                access='direct', recl=4)

    do i = 0,ubound(arr, 1)
        read(UNIT, rec=i+1, err=100) arr(i)  
    end do

    do i = 0,ubound(arr, 1)
        r = 1. - (arr(i)**2. + cos(real(i*DPHI, REAL_KIND))**2) 
        write(*, '(F6.4, " ")', advance='no')&
            real(int(r*1E6+1)/1E6, REAL_KIND)
    end do

100 close(UNIT)    
    write(*,*)
end program numpy_import

thus if val == sin(x), the numeric result must in good approximation vanish for float32 types.

And indeed:

output:

360 x 0.0000
decltype_auto
  • 1,706
  • 10
  • 19
1

So thanks to this great community, from all the advise I got, and a little bit of tinkering around, I think I figured out a stable solution to this problem, and I wanted to share with you all this answer. I will provide a minimal example here, where I want to write a variable size array from Python into a binary file, and read it using Fortran. I am assuming that the number of rows numRows and number of columns numCols are also written along with the full array datatArray. The following Python script writeBin.py writes the file:

import numpy as np
# Read in the numRows and numCols value 
# Read in the array values
numRowArr = np.array([numRows], dtype=np.float32)
numColArr = np.array([numCols], dtype=np.float32)
fileObj   = open('pybin.bin', 'wb')
numRowArr.tofile(fileObj)
numColArr.tofile(fileObj)
for i in range(numRows):
    lineArr = dataArray[i,:]
    lineArr.tofile(fileObj)
fileObj.close()

Following this, the fortran code to read the array from the file can be programmed as follows:

program readBin

    use iso_fortran_env

    implicit none

    integer:: nR, nC, i

    real(kind=real32):: numRowVal, numColVal
    real(kind=real32), dimension(:), allocatable:: rowData
    real(kind=real32), dimension(:,:), allocatable:: fullData

    open(unit=10,file='pybin.bin',form='unformatted',status='old',access='stream')

    read(unit=10) numRowVal
    nR = int(numRowVal)

    read(unit=10) numColVal
    nC = int(numColVal)

    allocate(rowData(nC))
    allocate(fullData(nR,nC))

    do i = 1, nR

        read(unit=10) rowData
        fullData(i,:) = rowData(:)

    end do

    close(unit=10)

end program readBin

The main point that I gathered from the discussion on this thread is to match the read and the write as much as possible, with precise specifications of the data types to be read, the way they are written etc. As you may note, this is a made up example, so there may be some things here and there that are not perfect. However, I have used this now to program a finite element program, and the mesh data was where I used this binary read/write - and it worked very well.

P.S: In case you find some typo, please let me know, and I will edit it out rightaway.

Thanks a lot.

computanjohn
  • 133
  • 8