reading fortran unformatted file with python

Question

I have a fortran program generating unformatted files and I am trying to read them into Python.

I have the source code so I know the first "chunk" is a character array of character*1 name(80) and so on. So I start out with

f = open(filename,'rb')
bytes = 80
name = struct.unpack('c'*bytes,f.read(bytes))

and name is an 80-length tuple consisting of strings of length 1; some of the contents of which are hexadecimal strings (e.g., \x00). How can I go about converting this variable to a single ascii string?

I guess I should also use `open(filename,'r')` instead of `'rb'`. — hatmatrix, Nov 15 '11 at 03:56

M. S. B. · Accepted Answer · 2011-11-15T04:29:09.613

Most Fortran unformatted files will contain extra bytes to specify the length of the record. A record is the group of items written with a single Fortran write statement. Typically 4-bytes at the beginning and end of each record. So in another language you will want to read these "hidden" values and skip them. In this case, if you try to interpret them as part of your string, you will add incorrect values to the string, which will likely have peculiar values for ASCII.

A Fortran string will be fixed length and padded on the end with blanks, which is 0x20 in ASCII. I would not expect the value 0x00 unless the string was not initialized or the Fortran programmer was using a string to hold binary data.

In this era, if a Fortran programmer is writing an unformatted/binary file that is intended for use with another language, they can cause these extra bytes to be omitted by using the "stream" IO method of Fortran 2003.

If you have access to the Fortran source code that wrote this file, you can easily check and see if this is the case - if it's using sequential I/O (the default) it will have the record header/footer where if it's specified as direct-access, it will not. You can also look at the file size and compute its "expected" size based on what you know it contains - if it's larger and you're sure you got everything, then it's most likely a result of these record header/footers. — Tim Whitcomb, Nov 15 '11 at 16:14

score 2 · Answer 2 · answered Nov 15 '11 at 03:47

2

Use the correct format specifier in the first place, then strip off the NULs.

>>> struct.unpack('%ds' % 20, 'Hello, World!' + '\x00' * 7)
('Hello, World!\x00\x00\x00\x00\x00\x00\x00',)
>>> struct.unpack('%ds' % 20, 'Hello, World!' + '\x00' * 7)[0].rstrip('\x00')
'Hello, World!'

answered Nov 15 '11 at 03:47

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

Ah, was not aware I could use this specifier. I see that `\x00` is NULL but I also have other strings like `\xa0`, `@\x08, and so on... is there a hex->ascii converter? I've been looking around and find it odd that I have not come across one. – hatmatrix Nov 15 '11 at 03:55
Anything below \x80 already is ASCII. Perhaps you need to decode further, or decide you're looking at a different character set. – Ignacio Vazquez-Abrams Nov 15 '11 at 04:09
Perhaps that is the case. Thanks. – hatmatrix Nov 15 '11 at 04:20
Are you sure the data really is meant to be interpreted as text? – Karl Knechtel Nov 15 '11 at 05:31

reading fortran unformatted file with python

2 Answers2

Linked