Reading a file in Python 2.7, ignoring lines, and allocating memory (re-writing fortran code)

Question

I am very new to programming and am converting a fortran90 code into python 2.7. I have done fairly well with it so far but have hit a difficult spot. I need to write this subroutine in Python but I don't understand the fortran notation and can't find any information on what the python equivalent of the Read(1,*) lines would be.

Any help at all would be greatly appreciated.

SUBROUTINE ReadBCoutput(filenameBC,count,timeArray,MbolArray,uArray,gArray,rArray,iArray,zArray)

! read Bruzual & Charlot (2003) stellar population synthesis models into arrays

CHARACTER*500,INTENT(IN):: filenameBC
INTEGER,INTENT(OUT):: count
REAL,DIMENSION(:),ALLOCATABLE,INTENT(OUT):: timeArray,MbolArray,uArray,gArray,rArray,iArray,zArray

REAL:: logTime,Mbol,g,uMg,gMr,gMi,gMz
REAL,DIMENSION(:),ALLOCATABLE:: timeArrayLocal,MbolArrayLocal,uArrayLocal,gArrayLocal,rArrayLocal,iArrayLocal,zArrayLocal

! open file and read off unnecessary 29 lines of comments
OPEN(1,FILE=TRIM(filenameBC),RECL=2000)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)
READ(1,*)

! now read arrays
count=0
ALLOCATE(timeArray(count))
ALLOCATE(MbolArray(count))
ALLOCATE(uArray(count))
ALLOCATE(gArray(count))
ALLOCATE(rArray(count))
ALLOCATE(iArray(count))
ALLOCATE(zArray(count))
IOEnd=0
DO WHILE(IOEnd>-1)
 READ(1,*,IOSTAT=IOEnd) logTime,Mbol,g,uMg,gMr,gMi,gMz
!print*,'filename is',filenameBC
   IF (IOEnd>-1) THEN ! not at end of file yet
     ! add new element to list
    count=count+1
    ALLOCATE(timeArrayLocal(count-1))
    ALLOCATE(MbolArrayLocal(count-1))
    ALLOCATE(uArrayLocal(count-1))
    ALLOCATE(gArrayLocal(count-1))
    ALLOCATE(rArrayLocal(count-1))
    ALLOCATE(iArrayLocal(count-1))
    ALLOCATE(zArrayLocal(count-1))
    DO countInside=1,count-1
       timeArrayLocal(countInside)=timeArray(countInside)
       MbolArrayLocal(countInside)=MbolArray(countInside)
       uArrayLocal(countInside)=uArray(countInside)
       gArrayLocal(countInside)=gArray(countInside)
       rArrayLocal(countInside)=rArray(countInside)
       iArrayLocal(countInside)=iArray(countInside)
       zArrayLocal(countInside)=zArray(countInside)
    END DO
    DEALLOCATE(timeArray)
    DEALLOCATE(MbolArray)
    DEALLOCATE(uArray)
    DEALLOCATE(gArray)
    DEALLOCATE(rArray)
    DEALLOCATE(iArray)
    DEALLOCATE(zArray)
    ALLOCATE(timeArray(count))
    ALLOCATE(MbolArray(count))
    ALLOCATE(uArray(count))
    ALLOCATE(gArray(count))
    ALLOCATE(rArray(count))
    ALLOCATE(iArray(count))
    ALLOCATE(zArray(count))
    DO countInside=1,count-1
       timeArray(countInside)=timeArrayLocal(countInside)
       MbolArray(countInside)=MbolArrayLocal(countInside)
       uArray(countInside)=uArrayLocal(countInside)
       gArray(countInside)=gArrayLocal(countInside)
       rArray(countInside)=rArrayLocal(countInside)
       iArray(countInside)=iArrayLocal(countInside)
       zArray(countInside)=zArrayLocal(countInside)
    END DO
    timeArray(count)=10**logTime
    MbolArray(count)=Mbol
    gArray(count)=g
    uArray(count)=uMg+g
    rArray(count)=g-gMr
    iArray(count)=g-gMi
    zArray(count)=g-gMz
    DEALLOCATE(uArrayLocal)
    DEALLOCATE(gArrayLocal)
    DEALLOCATE(rArrayLocal)
    DEALLOCATE(iArrayLocal)
    DEALLOCATE(zArrayLocal)
    DEALLOCATE(MbolArrayLocal)
    DEALLOCATE(timeArrayLocal)
 END IF
END DO
CLOSE(1)

END SUBROUTINE ReadBCoutput

I don't expect anyone to convert the whole thing for me - I would just like to be clear on what this is actually doing and what is/isn't necessary to do in Python. I'm capable of searching on my own but I'm kind of blown away by what to look for here.

Thanks so much!

To be a little clearer, I would like the arguments in the OPEN and READ functions explained so that I can understand it and convert it to Python — , Mar 19 '13 at 11:21
The [manual](http://www.stanford.edu/class/me200c/tutorial_77/15_files.html) is always a good place to start, especially if you are a beginner. — danodonovan, Mar 19 '13 at 11:25
warning whoever wrote this fortran doesn't know it much better than you.. anyway put this aside and go learn the basics of python if you don't even know how to open a file. — agentp, Mar 19 '13 at 12:48
Each `READ(1,*)` is reading 2000 (`RECL`) bytes of the file and assigning them to nothing since there's not list of variables. Since this is done 29 times, I assume the code is just skipping comments (as the comment in the code indicate ;-). Not clear why this wasn't done with a `DO` loop. In Python this might be the same as `_ = file.read(2000)` in a `for` loop. — martineau, Mar 19 '13 at 12:52
@martineau -- I could be wrong about this, but I don't think that it is actually reading 2000 bytes. First, the record length units are processor dependent `RECL=1` could indicate a 4-byte word for example. Second, I believe that when connected for formatted sequential access (as this file is), the `RECL` indicates (at most) the maximum line length, but I believe a line could be shorter. At worst, specifying RECL here is undefined behavior (It was in the f77 standard which is the only one I can find right now) — mgilson, Mar 19 '13 at 13:29
interesting gfortran (4.x) seems to ignore the recl specification for list directed (*) reads (as we have here), however if you do a formatted read it terminates the data at the specified record length. Daniel first thing I'd do is get that recl= garbage out of there and make sure the fortran still works correctly. — agentp, Mar 19 '13 at 16:28

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

In fortran, open(1,FILE=TRIM(filenameBC),RECL=2000) opens the file with name filenameBC. The TRIM part is unnecessary as the fortran runtime library will do that for you (it's python equivalent is filenameBC.rstrip()). The RECL=2000 part here is also a little fishy. I don't think that it does anything here -- Actually, I think that using it is undefined behavior since you're file should be connected for "sequential" access. According to the fortran77 standard section 12.10.1,

RECL = rl

rl is an integer expression whose value must be positive. It specifies the length of each record in a file being connected for direct access. If the file is being connected for formatted input/output, the length is the number of characters. If the file is being connected for unformatted input/output, the length is measured in processor-dependent units. For an existing file, the value of rl must be included in the set of allowed record lengths for the file ( 12.2.2). For a new file, the processor creates the file with a set of allowed record lengths that includes the specified value. This specifier must be given when a file is being connected for direct access; otherwise, it must be omitted.

This may have changed in a newer revision of the standard -- If so, I believe that it specifies the maximum line length.

Fortran filehandles are simply integers. So, whereas in python you would say:

filehandle = open('file')
line = filehandle.readline()  #or better, `next(filehandle)` :)

In fortran this is roughly the same as:

integer filehandle
filehandle = 1       ! pick an integer any positive one 
                     ! will do that we haven't used already,
                     ! but it's best to avoid 0,5,6 as those 
                     ! usually mean `stderr`,`stdin` and `stdout`.
open(filehandle,file='file')
read(filehandle,*) line

the * basically gets you to read a single line from the file.

Note that this fortran code is a little buggy and HUGELY inefficient. For example the check IF (IOEnd>-1) THEN succeeds under any condition that isn't an end of file (e.g. strange errors will be masked similar to a bare except in python). In python, you can just pack this information into a list and grow the list dynamically -- python will handle all of the re-allocation that you need to do. At the end, you may choose to convert the list to a numpy ndarray.

In pseudo-python code, this translates roughly to:

data_list = []
with open(filenameBC.rstrip()) as fin:
    for _ in range(29): #throw away first 29 lines (I think I counted right ...)
        next(fin)
    
    for line in fin:
        data_list.append([float(x) for x in line.strip()])
    
timeArray,MbolArray,uArray,gArray,rArray,iArray,zArray = zip(*data_list)

But he's going to have to cast the inputs into floats, ints etc. too. — bob.sacamento, Mar 19 '13 at 18:30
@bob.sacamento -- They're all fortran "`real`". The closest (pure python) equivalent is `float` (which is more like `real*8`/`double precision`), but I've taken care of the "casting" in the list-comprehension in `data_list.append`. — mgilson, Mar 19 '13 at 18:32

score 0 · Answer 2 · answered Mar 19 '13 at 18:37

READ(1,*) is reading .... something out of your file and not storing it, i.e. just throwing it away. All those READ(1,*) statements are just a way of scrolling through the file until you get to the data you actually need. (Not the most compact way to code this, by the way. Whoever wrote this FORTRAN code may have been very smart in many respects but was not a terribly good programmer. Or maybe they were in a big hurry.) A python equivalent would just be

>>> infile.readline()

Note that FORTRAN can read data in as integers, floats, what have you, but python is just going to read everything as text, and then you are going to have to cast it to whatever numerical form you need.

However, if you want to look at NumPy, it has a couple a routines that can read data as numbers: loadtxt and genfromtxt. Maybe a few others too, but those are the ones I have found most helpful.

Reading a file in Python 2.7, ignoring lines, and allocating memory (re-writing fortran code)

2 Answers2