0

I have an f77 unformatted binary file. I know that the file contains 2 floats and a long integer as well as data. The size of the file is 536870940 bytes which should include 512^3 float data values together with the 2 floats and the long integer. The 512^3 float data values make up 536870912 bytes leaving a further 28 bytes.

My problem is that I need to work out where the 28 bytes begins and how to skip this amount of storage so that I can directly access the data.

I prefer to use C to access the file.

stars83clouds
  • 795
  • 1
  • 8
  • 25
  • 1
    If `f77` is for Fortran, than I really don't understand why it is tagged C. The last sentence is not justifying it. – Eugene Sh. Dec 10 '18 at 17:48
  • 1
    The binary file is a fortran unformatted binary file. I am accessing the data in C, not fortran. – stars83clouds Dec 10 '18 at 17:50
  • 2
    Fortran does not specify the format of so-called "unformatted" files. There are some common variations, but you need to know what Fortran implementation wrote it, possibly with which options, or else perform some forensics. It may also matter what specific sequence of output statements were used to write the contents. – John Bollinger Dec 10 '18 at 17:50
  • Thanks @John Billinger. What possible diagnostics could I perform? I was able to work out the size of the file using ftell. Unlike, previous data files that I have encountered, I have the additional 28 bytes. My issue is that I don't know if this data (2 floats and 1 long integer) is at the beginning or the end of the file. – stars83clouds Dec 10 '18 at 17:54
  • 2
    Once, several decades ago, I had a similar problem - reading a Fortran unformatted binary file into a C program. After banging my head against the wall for a few days, I finally wound up writing the input routine in Fortran and calling that from C. Do you have no documentation to tell you where the sequence in question appears? Do you know it's either at the beginning or the end (not somewhere in between)? – John Bode Dec 10 '18 at 17:59
  • Possible duplicate of [Fortran unformatted file format](https://stackoverflow.com/questions/8751185/fortran-unformatted-file-format) – Vladimir F Героям слава Dec 10 '18 at 18:02
  • You would be best off finding the source code for the program that wrote the data, especially if you don't have a reliable way to validate the data after you read it. Bonus points for learning which version of which compiler was used to build the program that wrote it. Lacking that, "performing forensics" means looking at the file content and trying to figure it out. It may help to know that the file formats that Fortran calls "unformatted" are still record-based. They normally contain record-length metadata. – John Bollinger Dec 10 '18 at 18:02
  • 1
    There are many questions and answers about this already. – Vladimir F Героям слава Dec 10 '18 at 18:03
  • @JohnBode I am pretty certain the extra data is sitting at the beginning of the file but would like to confirm. Also, 28 bytes seems not to match the exact space required for two floats and one long, or does it? – stars83clouds Dec 10 '18 at 18:04
  • @stars83clouds: That was what had me banging my head against the wall when I went through it - there was either a magic cookie or metadata that kept throwing off the alignment, so my output was always garbage. For whatever reason, the Fortran I/O statements knew how to deal with it. – John Bode Dec 10 '18 at 19:36
  • 1
    Use a hex editor to try to reverse engineer the structure of the data. – John Alexiou Dec 10 '18 at 19:41

1 Answers1

2

Unfortunately, there is no standard what unformatted means. But some methods are more common than others.

In many Fortran versions I have used, every write command writes a header (often unsigned int 32) of how many bytes the data is, then the data, then repeats the header value in case you're reading from the rear.

From the values you have provided, it might be that you have something like this:

  • uint32(record1 header), probably 12.
  • float32, float32, int32 (the three 'other values' you talked about)
  • uint32(record1 header, same as first value)
  • uint32(record2 header, probably 512^3*4)
  • float32*512^3
  • uint32(record2 header, same as before)

You might have to check endianness.

So I suggest you open the file in a hexdump program, and check whether bytes 0-3 are identical to bytes 16-19, and whether bytes 20-23 are repeated at the end of the data again.

If that is the case, I'll try to check the endianness to see whether the values are little or big endian, and with a little luck you'll have your data.

Note: I assume that these three other values are metadata about the data, and therefore would be at the beginning of the file. If that's not the case, you might have them at the end.

Update:

In your comment, you write that your data begins with something like this:

0C 00 00 00 XX XX XX XX XX XX XX XX XX XX XX XX 0C 00 00 00
^- header-^                                     ^-header -^
E8 09 FF 1F (many, many values) E8 09 FF 1F
^- header-^ ^--- your data ---^ ^-header -^

Now I don't know how to read data in C. I leave this up to you. What you need to do is skip the first 24 bytes, then read the data as (probably little endian) 4-byte floating values. You will have 4 bytes left that you don't need any more.

Important note: Fortran stores arrays column-major, C afaik stores them row-major. So keep in mind that the order of the indices will be reversed.

I know how to read this in Python:

from scipy.io import FortranFile
ff = FortranFile('data.dat', 'r', '<u4')
# read the three values you are not interested in
threevals = ff.read_record('<u4')
# read the data
data = ff.read_record('<f4')
ff.close()
chw21
  • 7,970
  • 1
  • 16
  • 31
  • chw21: This is very interesting. For positions 0-7, I can see 0C 00 00 00 00 00 00 00, although the positions 5-7 don't have the same values as 1-4 when selected. Positions 16-19 repeat 0-3. As you pointed out, positions 20-23 are repeated at the end of the file. But tell me, what does all that mean? Why isn't the repeating pattern observed throughout the file if it has been repeated at least once. – stars83clouds Dec 12 '18 at 08:25