1

Good evening,

I have a simulation written in Fortran that produces large files of unformatted (direct access) data. From some of these files I want to produce ascii human-readable files.

For some reason this (in python):

f = open(filename,'rb')
for i in xrange(0,N):
    pos = i * 64
    f.seek(pos)
    name = struct.unpack('ffff',f.read(16))
    print name[0],name[1],name[2],name[3]

takes only ~4 seconds (piping the output into a file on the shell) while this (in Fortran)

 open (1,file=inputfile,access='direct',recl=64, action='read',status="OLD")
 open (2, file=outputfile, access="sequential", action="write",status="REPLACE")
 do i=1,(N)
     read(1, rec = i ) a,b,c,d
     write(2,*) a,b,c,d
 enddo

takes ~ 20 seconds. What am I doing wrong? Is there a faster way of doing this in Fortran?

Best regards! rer

rer
  • 11
  • 1
  • 3
  • Try writing to standard output with the fortran program and piping to the output file. – Tarik Aug 11 '13 at 20:40
  • thx - ok, I tried that, but it didn't change the time it needs – rer Aug 11 '13 at 21:43
  • 1
    I think, the reason for the slow writing speed is the way, Fortran handles its output units. [Here](https://www.ibm.com/developerworks/mydeveloperworks/blogs/b10932b4-0edd-4e61-89f2-6e478ccba9aa/entry/improving_i_o_performance_in_xl_fortran3?lang=en) is a description, what IBM's XLF is doing (tl;dr: prep,lock,write,cleanup,unlock). You could try to merge several read/write statements by unfolding the loop manually or simply store larger chunks of data. In your case, the overhead seems to be the performance killer. – Stefan Aug 12 '13 at 13:59
  • read into a large array and write the whole thing with a single statement – agentp Aug 12 '13 at 14:05
  • 1
    I think @Stefan and george have a point. You could pre-allocate a large string then print on it using a fixed length format and finally output the whole string in one shot. Of course, you could do a chunk at a time (in the order of several thousand numbers) – Tarik Aug 12 '13 at 16:02
  • There is also an option to do buffered IO operations. Maybe this would help, too. – Stefan Aug 16 '13 at 13:54

2 Answers2

4

DISCLAIMER: I don't know, if this solves the problem, but I know, that I can get time differences by a factor of up to 20. I also did only test the output of data and didn't read it.


I was investigating the interaction of Fortran with python and as such wanted to know, how Fortran's binary files are build. While doing this, I noticed, that both ifort and gfortran have an option to switch buffered IO on or off.

ifort: You can specify the keyword BUFFERED=['YES'|'NO'] while opening a file.

gfortran: You can set the environmental variable GFORTRAN_UNBUFFERED_ALL to y|Y|1 or n|N|0 for unbuffered and buffered IO, respectively.

Please note, that gfortran does buffer IO by default, while ifort does not.

My sample code at the bottom results in the following times:

        |buffered|unbuffered
--------+--------+----------
ifort   |   1.9s |  18.2s
gfortran|   2.4s |  37.5s

This sample code writes a direct access binary file with 10M datasets of 12 bytes each.

PROGRAM btest
IMPLICIT NONE

INTEGER :: i

! IFORT
OPEN(11,FILE="test_d.bin",ACCESS="DIRECT",FORM="UNFORMATTED",RECL=3, &
& STATUS="REPLACE",BUFFERED="NO") ! ifort defines RECL as words
! GFORTRAN
!OPEN(11,FILE="test_d.bin",ACCESS="DIRECT",FORM="UNFORMATTED",RECL=12, &
!& STATUS="REPLACE") ! gfortran defines RECL as bytes

DO i = 1, 10000000
    WRITE(11,REC=i) i,i*1._8
END DO

CLOSE(11)

END PROGRAM
Stefan
  • 2,460
  • 1
  • 17
  • 33
0

Try using StreamIO see http://www.star.le.ac.uk/~cgp/streamIO.html That should allow random access without fixed record size and will probably result in using the same underlying O.S. system calls, thereby hopefully getting the same performance.

Tarik
  • 10,810
  • 2
  • 26
  • 40
  • Thx! I tried `ACCESS="STREAM", FORM="FORMATTED"` and wrote with `write(2,"(4(ES14.6E2,X))") a,b,c,d` but there was no improvement in performance. Even if I reduce the number of digits printed below the number of digits which the python output generates it's much slower. I also verified that reading the data is very fast, thus it's the printing that takes quite a bit. – rer Aug 12 '13 at 07:16
  • 1
    I suspect the reading to be slow an this is where I would like you to try StreamIO. To confirm my suspicion, comment out the write statement. Just read without writing. – Tarik Aug 12 '13 at 07:21
  • That's exactly what I did to verify which process takes more time. It reads the file in about ~3 seconds. To make sure that my compiler doesn't skip the reading loop as the data read is not used, I printed the last entries to the command line. – rer Aug 12 '13 at 13:39