1

I have a fortran program generating unformatted files and I am trying to read them into Python.

I have the source code so I know the first "chunk" is a character array of character*1 name(80) and so on. So I start out with

f = open(filename,'rb')
bytes = 80
name = struct.unpack('c'*bytes,f.read(bytes))

and name is an 80-length tuple consisting of strings of length 1; some of the contents of which are hexadecimal strings (e.g., \x00). How can I go about converting this variable to a single ascii string?

hatmatrix
  • 42,883
  • 45
  • 137
  • 231

2 Answers2

6

Most Fortran unformatted files will contain extra bytes to specify the length of the record. A record is the group of items written with a single Fortran write statement. Typically 4-bytes at the beginning and end of each record. So in another language you will want to read these "hidden" values and skip them. In this case, if you try to interpret them as part of your string, you will add incorrect values to the string, which will likely have peculiar values for ASCII.

A Fortran string will be fixed length and padded on the end with blanks, which is 0x20 in ASCII. I would not expect the value 0x00 unless the string was not initialized or the Fortran programmer was using a string to hold binary data.

In this era, if a Fortran programmer is writing an unformatted/binary file that is intended for use with another language, they can cause these extra bytes to be omitted by using the "stream" IO method of Fortran 2003.

M. S. B.
  • 28,968
  • 2
  • 46
  • 73
  • If you have access to the Fortran source code that wrote this file, you can easily check and see if this is the case - if it's using sequential I/O (the default) it will have the record header/footer where if it's specified as direct-access, it will not. You can also look at the file size and compute its "expected" size based on what you know it contains - if it's larger and you're sure you got everything, then it's most likely a result of these record header/footers. – Tim Whitcomb Nov 15 '11 at 16:14
2

Use the correct format specifier in the first place, then strip off the NULs.

>>> struct.unpack('%ds' % 20, 'Hello, World!' + '\x00' * 7)
('Hello, World!\x00\x00\x00\x00\x00\x00\x00',)
>>> struct.unpack('%ds' % 20, 'Hello, World!' + '\x00' * 7)[0].rstrip('\x00')
'Hello, World!'
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358