Reading a direct access fortran unformatted file in Python

Question

I'm completely new to Python and am writing my visualization codes in Python from scratch (to avoid using expensive proprietary programs like IDL). Until now I've used IDL and gnuplot. What I want to be able to do is this:

I write two dimensional arrays to unformatted direct access files using fortran which I want to be able to read in python. The exact test code is given below. The actual code is a huge parallel code but the data output is almost the exact same format.

program binary_out
implicit none
integer :: i,j,t,rec_array
double precision, dimension(100,100) :: fn
double precision, parameter :: p=2*3.1415929
INQUIRE(IOLENGTH=rec_array) fn
open(unit=10,file='test',status='new',form='unformatted',access='direct',recl=rec_array)                                                                           
   fn=0
   write(10,rec=1) fn
do t=1,3
do i=1,100
   do j=1,100
      fn(i,j)=sin(i*p*t/100)*cos(j*p*t/100)
   enddo
enddo
   write(10,rec=t+1) fn
enddo
close(10)
end program binary_out

The program should give me zeros for t=1 and increasing number of "islands" for increasing value of t. But when I read it using python code given below, I just get zeros. If I remove the first write statement of zeros, I just get the first time slice irrespective of what value of "timeslice" I use in the python code. The code I have so far is:

#!/usr/bin/env python
import scipy
import glob
import numpy as np
import matplotlib.pyplot as plt
import os, sys
from pylab import *

def readslice(inputfilename,field,nx,ny,timeslice):
   f=open(inputfilename,'r')
   f.seek(timeslice*nx*ny)
   field=np.fromfile(inputfilename,dtype='d',count=nx*ny)
   field=np.reshape(field,(nx,ny))
   return field

a=np.dtype('d')
a=readslice('test',a,100,100,2)

im=plt.imshow(a)
plt.show()

I want the def readslice to be able to read a record at the i-th place if timeslice equals i. For that I've tried to use f.seek but it does not seem to work. numpy.fromfile seems to start reading from the first record itself. How do I make numpy.fromfile to read from a specific point in a file?

I'm still trying to get used to the Python style and digging through the documentation. Any help and pointers would be greatly appreciated.

I do this often (Fortran direct access + python file seek + numpy.fromfile). Your code looks correct. Try reading your data from a Fortran program as a sanity check. Also, post the Fortran code snippet that you use to write data. Make sure data you write/read is same precision/datatype. You say what you are doing does not seem to work. Can you include an example of what the data should look like and what you are actually getting? — milancurcic, May 07 '12 at 04:54
IRO-bot, could you please share your snippet of the code where you read such files? This code seems to start from the first record all the time. The documentation for np.fromfile does not seem to have a keyword for starting point of the data reading. — toylas, May 07 '12 at 05:55
I have added the exact fortran code in my OP. So now you can even test the above two codes yourself to see the problem I'm talking about. Thanks for the help!! — toylas, May 07 '12 at 05:59

score 6 · Accepted Answer · answered May 07 '12 at 06:52

Here is a python code that will work for you:

def readslice(inputfilename,nx,ny,timeslice):
   f = open(inputfilename,'rb')
   f.seek(8*timeslice*nx*ny)
   field = np.fromfile(f,dtype='float64',count=nx*ny)
   field = np.reshape(field,(nx,ny))
   f.close()
   return field

In your original code, you were passing file name as first argument to np.fromfile instead of file object f. Thus, np.fromfile created a new file object instead of using the one that you intended. This is the reason why it kept reading from the beginning of the file. In addition, f.seek takes the number of bytes as argument, not the number of elements. I hardcoded it to 8 to fit your data, but you can make this general if you wish. Also, field argument in readslice was redundant. Hope this helps.

Thanks a lot IRO-bot!! Ultimately it was my stupid mistake!! Thanks for pointing it out! Now everything works perfectly fine!!! — toylas, May 07 '12 at 16:56
Done! I was trying to +1 it but was not allowed as I do not have a "reputation" yet :-p Clicked the check mark. I'm still very new to this website. — toylas, May 09 '12 at 22:06

score 1 · Answer 2 · answered May 07 '12 at 03:58

I don't think all FORTRAN runtimes are the same; some frame records, some don't, and I'm not that confident that the ones that do record framing will all do it the same way. They can generally read back records written by themselves of course, but interop from one FORTRAN runtime to another might not be there.

You probably should write a small test program in your chosen FORTRAN, that writes a couple of records similar to your production code, and then pick apart the results using your favorite binary file editor - I like bpe, but there are many of them available.

Then after you understand what's really being written, pull things back in using the Python struct module or similar.

bpe: http://sourceforge.net/projects/bpe/

Thanks for the answer but as IRO-bot has pointed out above, the direct access files do not have any record info. I read such files in IDL all the time. And this is independent of compiler. I did try the struct module but I do not understand python well enough yet. So I was not able to implement it. The above numpy.fromfile is the closest implementation apart from the seek issue. — toylas, May 07 '12 at 05:58

Reading a direct access fortran unformatted file in Python

2 Answers2

Linked