2

I have a binary file I want to read into Python. The file consists of three parts: a list of wavenumbers, a list of temperatures, and a list of opacities as a function of temperature and pressure. I would like to import the first two of these as vectors a and b, and the third as a 2D array c such that c[x,y] corresponds to a[x] and b[y].

An existing FORTRAN90 code is able to achieve this as follows:

integer   nS, nT
parameter (nS = 3000)
parameter (nT = 9)

real(8) wn_arr(nS)      ! wavenumber [cm^-1]
real(8) temp_arr(nT)    ! temperature [K]
real(8) abs_arr(nS,nT)  ! absorption coefficient [cm^-1 / amagat^2]

open(33,file=trim(datadir)//'CO2_dimer_data',form='unformatted')
read(33) wn_arr
read(33) temp_arr
read(33) abs_arr
close(33)

I tried the following python code:

f=scipy.io.FortranFile('file', 'r')
a_ref=f.read_reals(np.float64) #wavenumber (cm**-1)
b=f.read_reals(np.float64) #temperature (K)
c=f.read_reals(np.float64).reshape((3000,9))

However, this generates results that are not correct. I suspect this is because Fortran writes arrays to file in a different order than Python does. However, simply adding order='F' to the reshape command does not work. I suspect this is because when read in, abscoeff_ref is already flattened.

Any thoughts?

S. R.
  • 21
  • 3
  • 2
    Fortran unformatted files are a portability nightmare. Generally, you can only expect them to work if you read/write with the same compiler on the same hardware (since the record delimiters aren't standardized). It looks like `scipy.io.FortanFile` is assuming that the records were written using `gfortran` on an `x86_64` architecture. Is that the compiler/architecture that you're using? – mgilson May 06 '16 at 04:01
  • The answer depends on how often you need to read this file and how portable the whole system should be. Your biggest problem is, as mgilson already said, that the file is in an unformated SEQUENTIAL format, which stores not only the data, but also a begin-record and end-record mark around each record (= roughly each written variable), the size of which is compiler dependent. If you just want to read the file and don't care about portability, try to write the parsing code yourself. I have no experience with the scipy.io.FortranFile method, but its options seem limited and if it doesn't work... – StefanS May 06 '16 at 07:02
  • Without having the file it's hard to help you with code, but I have read unformated (direct access, not sequential) Fortran files with numpy.fromfile. You'd have to use the file.seek() method to skip the begin-record and end-record markers and play with their size until you hit the right one (use 4 or 1 byte to start, those should be the most common ones). – StefanS May 06 '16 at 07:10
  • If you can change the Fortran code that wrote the file, have to build a reusable tool chain (where you often write the file and then read it in Python) and want the whole thing to be portable, I'd suggest either switching to unformated, direct access or stream IO (both of which don't write begin-record and end-record markers, just the data). The best practice method is probably switching to a standard data format like netCDF, but that may be overkill. I also apologize for abusing the comments, but I don't really have an answer for you. – StefanS May 06 '16 at 07:15
  • 1
    You can show us how do incorrect results look like. How big are the arrays read? What are the values in them? – Vladimir F Героям слава May 06 '16 at 07:42
  • 1
    Related/ duplicates: http://stackoverflow.com/questions/8131204/reading-fortran-unformatted-file-with-python http://stackoverflow.com/questions/15608421/inconsistent-record-marker-while-reading-fortran-unformatted-file http://stackoverflow.com/questions/23377274/how-to-read-fortran-77-unformatted-binary-file-into-python more can be found by search also for reading from C and C++. – Vladimir F Героям слава May 06 '16 at 08:24

1 Answers1

4

To give you an idea of what I meant in my second comment, I made a mock up:

testwrite.f90, compiled with gfortran 4.8.4: It basically writes an unformated, sequential file with the arrays you specified (just a lot smaller, to be able to compare them by eye) filled with arbitrary data. It also prints the arrays.

implicit none
integer   nS, nT, i ,j
parameter (nS = 10)
parameter (nT = 3)

real(8) wn_arr(nS)      ! wavenumber [cm^-1]
real(8) temp_arr(nT)    ! temperature [K]
real(8) abs_arr(nS,nT)  ! absorption coefficient [cm^-1 / amagat^2]

wn_arr = (/ (i, i=1,nS) /)
temp_arr = (/ (270+i, i=1,nT) /)
abs_arr = reshape( (/ ((10*j+i, i=1,nS), j=1,nT) /), (/nS, nT/))

print*, wn_arr
print*, '-----------------'
print*, temp_arr
print*, '-----------------'
print*, abs_arr
print*, '-----------------'
print*, 'abs_arr(5,3) = ', abs_arr(5,3)

open(33,file='test.out',form='unformatted')
write(33) wn_arr
write(33) temp_arr
write(33) abs_arr
close(33)

end

testread.py, tested with Python 2.7.6, then reads the file written above and prints the arrays as well. For me the output of both programs is the same. YMMV.

import numpy as np

rec_delim = 4  # This value depends on the Fortran compiler
nS = 10
nT = 3

with open('test.out', 'rb') as infile:
    infile.seek(rec_delim, 1)  # begin record
    wn_arr = np.fromfile(file=infile, dtype=np.float64, count=nS)
    infile.seek(rec_delim, 1)  # end record
    infile.seek(rec_delim, 1)  # begin record
    temp_arr = np.fromfile(file=infile, dtype=np.float64, count=nT)
    infile.seek(rec_delim, 1)  # end record
    infile.seek(rec_delim, 1)  # begin record
    abs_arr = np.fromfile(file=infile,
                        dtype=np.float64).reshape((nS, nT), order='F')
    infile.seek(rec_delim, 1)  # end record

print(wn_arr)
print(temp_arr)
print(abs_arr)
# The array has the same shape, but Fortran starts index (per default at least)
# at 1 and Python at 0:
print('abs_arr(5,3) = ' + str(abs_arr[4,2]))

Short explanation: I open the file in a with-block (good practice in Python) and then I step through the file, using the knowledge of how the file is written. This makes it unportable. infile.seek(4, 1) moves the read-pointer of Python forward 4 bytes, from the current position (the option 1), because I know that the file starts with a begin-record marker that is 4 bytes long (gfortran).

Then I use numpy.fromfile to read count=10 values of float64, which is the wavenumber array.

Next I have to skip the end-record and begin-record markers. This could of course be done by infile.seel(8, 1) as well.

Then I read the temperature array, skip end-record and begin-record markers again and read the 2D-array. The data in the file doesn't know that it's 2D, so I need to reshape it, using the Fortran-order. The last .seek() is spurious, I just wanted to emphasize the structure.

I strongly suggest again that you do not build a bigger system on code like this. It's fine for a one-off, but terrible for something you have to use again or share.

StefanS
  • 1,740
  • 14
  • 20