Reading a direct access fortran unformatted file in C++

Question

I am currently trying to C++ read a Fortran-written binary file, and I am not having much success. The Fortran code that writes the file is not my own, although the C++ parsing routine is.

The first record of the binary file has been written using the following statement(s):

INTEGER var1 var2 var3
WRITE(12,REC=1) var1,var2,var3

A Fortran snippet that performs a succcesfull read looks like this:

open(unit=10,file="ETC.bin",access='direct',recl=24,iostat=iost,status='old')
read (unit=10,rec=1) var1,var2,var3
close(unit=10)
print*,var1,var2,var3

On the C++ side of things, I have so far come up with the following:

FILE* binfile = fopen("ETC.bin","rb") ;
fseek (binfile,0,SEEK_END) ;
long lSize = ftell (binfile) ;
char* buffer = (char*) malloc (sizeof(char)*lSize) ;
rewind (binfile) ;
size_t result=fread(buffer,1,96,binfile) ;
for (unsigned i = 0; i<=result; i++){
   printf("%f\n",buffer[i]) ;
}

My C++ printf statement, unfortunately, returns nonsense. Note that I am assuming that Fortran is relying on 4 bit words (e.g. gfortran compiler), and that if ifort is used, the

--assume byterecl

option is needed at compile time.

I know what the result should be, but I am not sure as to how to duplicate the behavior of the Fortran read statements in C++.

Thanks for any and all help!

P.S. There is a similar question posted here: reading fortran binary file in c++, which points to the following dead link. Not much information out there, or my Google-Fu is lousy.

You should first confirm that the computer used to write the file is endian-compatible with the computer reading the file. — Carlton, May 06 '15 at 00:36
Is that done on a per-word basis, or is it per-record? I have come across the topic of Endianness, but I was assuming that this would not be an issue. — EmilioW, May 06 '15 at 00:44
It would be on a word basis. It may not be an issue if the reading and writing computers are the same endianness, but if they ARE different, it will make debugging your program very difficult. [This thread has a simple implementation for checking endianness](http://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program) — Carlton, May 06 '15 at 00:56
You should go and read about [working with files in C++](http://www.cplusplus.com/doc/tutorial/files/). — MahanGM, May 06 '15 at 07:11
@MahanGM , thanks for the link, and it contains some proper syntax suggestions. But the question I have is more about recovering data from an unformatted file that has been written in Fortran. — EmilioW, May 08 '15 at 22:23
@EmilioW The file is either text or binary. You're trying to read it as binary in your code so you know it's binary. I assume the problem is, you don't know how to read different data sizes. If you follow C++ tutorials you'd get how it works. — MahanGM, May 09 '15 at 07:52
I am not sure that this is feasible. Sure, you can make it work in some cases, but I believe this cannot be made in a portable way. In http://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes it is said that the std::ios::binary is implementation defined so no portable streams. Also http://en.cppreference.com/w/cpp/io/c/fopen says that the binary modes "rb" or "wb" has no effect on POSIX but has effects on windows. So I think that it is not feasible to mix binary from C++ and fortran and that if the need arise, we should mix C++ and fortran code instead. — Mathieu Dutour Sikiric, Jun 09 '16 at 09:19
Did you consider doing the IO in Fortran, and calling these IO routines from C++ ? — Basile Starynkevitch, Mar 21 '18 at 05:36

score 3 · Answer 1 · edited May 23 '17 at 11:55

I'm not very good in C, but I have tried a few things.

First for the Fortran part:

program direct_access
    implicit none
    integer, parameter :: UNT = 63347
    open(unit=UNT, file='delme.unf', access='DIRECT', &
        form='UNFORMATTED', status='REPLACE', recl=24)
    write(UNT, rec=1) 1, 2, 3
    write(UNT, rec=2) 4, 5, 6
    close(UNT)
end program direct_access

I am writing 3 integers, of 4 bytes each, into an unformatted file with a record length of 24 bytes. (Note: I am assuming here that the record length is in bytes, apparently that isn't guaranteed and compiler and system dependent.)

Also, from my preferred Fortran book

Unformatted direct addess files are both smaller and faster than formatted direct access files, but they are not portable between different types of processors.

(Unless FORM='FORMATTED' is specifially given when opening the file, it will be unformatted.)

Test whether the data was written correctly:

$ hexdump delme.unf
0000000 0001 0000 0002 0000 0003 0000 0000 0000
0000010 0000 0000 0000 0000 0004 0000 0005 0000
0000020 0006 0000 0000 0000 0000 0000 0000 0000
0000030

Looks good. Note that the record length (24 bytes) is larger than the data (3*4 bytes), so there are unused data blocks inside.

Now to the C program, not my expertise:

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

off_t fsize(const char *filename) {
    struct stat st; 

    if (stat(filename, &st) == 0)
        return st.st_size;

    return -1; 
}

int main(){
    int record_size=24;
    int num_records=fsize("delme.unf") / record_size;
    FILE* binfile = fopen("delme.unf","rb") ;
    int* record = (int*) malloc (record_size) ;
    size_t result ;
    for (unsigned j=0; j < num_records; j++) {
        fseek(binfile, j * record_size, SEEK_SET) ;
        printf("%i : ", j) ;
        result=fread(record,sizeof(int),record_size/sizeof(int),binfile) ;
        for (unsigned i = 0; i<result; i++){
            printf("%i ",record[i]) ;
        }
        printf("\n");
    }
    free(record);
    fclose(binfile);
}

Output:

0 : 1 2 3 0 0 0 
1 : 4 5 6 0 0 0

Also good.

A few things I noticed:

Your buffer is of type char -- meaning a single byte per element. But integers have 4 bytes. That means that the file contents are split into several elements.
also, your fortran code sets a record length of 24 (bytes, I assume), but 3 integers only use 4 bytes each, so half of the record is not used. That's why read gives three more zeros.
If you have result elements, then the indices of buffer need to go from 0 to result-1.
The way you determine the size of the file is apparently not a good idea, see here
You're using %f as an output, indicating floats? But I thought these were ints?

Of course, if you don't care about reading the data out-of-order, you can just loop over the file:

#include <stdlib.h>
#include <stdio.h>

int main() {
    FILE* data = fopen("delme.unf", "rb") ;
    int var ;
    while (! feof(data )) {
        fread(&var, sizeof(int), 1, data);
        printf("%i ", var);
    }
    printf("\n");
    fclose(data);
}

There are certainly people that will help you write better C code than I do.

I hope that someone with knowledge of C can have a look at this, because while they seem to work, I'd be surprised if either of my C-programs are any good. — chw21, May 06 '15 at 05:12
What modifications would be required for a real*8 fortran write + corresponding read statement in C++? — EmilioW, May 08 '15 at 21:55
`int var;` becomes `double var` and `sizeof(int)` becomes `sizeof(double)`, I think. Try it out. Oh, and you need a different `printf` statement. — chw21, May 09 '15 at 10:40

Hayashi Yoshiaki · Answer 2 · 2018-03-21T11:22:02.107

EDIT Please take this as a new answer

your problem is in this part

for (unsigned i = 0; i<=result; i++){
   printf("%f\n",buffer[i]) ;
}

First of all, i<=result should be i<result. otherwise it reads (result+1) bytes from buffer. This is a common mistake of C/C++ beginners. In C/C++, if you traverse an array with N elements, you traverse index from 0 to N-1.

Second, i++ should be i+=4. Fortran INTEGER type and C int type is usually 4 bytes.

Finally, printf("%f\n",buffer[i]) should be printf("%d\n",(int)buffer[i]). %f in printf takes a floating number. To print integers, you use %d. (int)buffer[i] make the program to reinterpret the buffer[i] as a int type from the original char type. It may not cause errors without this, but compiler usually complains.

EDIT2 Maybe you must use *((int*)(&buffer[i])) instead of (int)buffer[i].

Alternative way is to use 4 bytes integer array. In this case, the code become like below. uint32_t is a 4 bytes integer type. Usually int type is 4 bytes, but C standard says it maybe 2 bytes. So to use uint32_t is safe.

FILE* binfile = fopen("ETC.bin","rb") ;
fseek (binfile,0,SEEK_END) ;
long lSize = ftell (binfile) ;
rewind (binfile) ;
if(lSize >= 24*sizeof(uint32_t)){
  uint32_t array[24];
  fread(array,sizeof(uint32_t),24,binfile) ;
  for (int i = 0; i<24; i++){
    printf("%d\n",array[i]) ;
  }
}else{
  printf("file size is too small.\n");
}

EDIT This answer below is not related to your problem.

Fortran's write routine in unformatted mode automatically add headers/footers to the main data. The headers/footers are binary strings contain the data size. The length of the header/footer is determined by the compiler.

This is a binary output of some fortran program.

$xdd fort.20
00000000: 4000 0000 6664 6532 6265 616d 2020 2020  @...fde2beam    
00000010: 2020 2020 2020 2020 0432 0c00 7e03 0000          .2..~...
00000020: 7e03 0000 f059 34e7 a9d5 853f 8534 5ad5  ~....Y4....?.4Z.
00000030: 0910 1340 f059 34e7 a9d5 853f 8534 5ad5  ...@.Y4....?.4Z.
00000040: 0910 1340 4000 0000

In the above example, 40000000 is the header/footer. Since 40000000 is 64, there must be 64 bytes between the header and the footer. Now in the displayed hex text, You can see 32*4 hexes are in between the footer and the header. since 2hex = 1byte, in fact there are 64 bytes between the footer/header.

So when you make C program to read fortran unformatted binary, your program need to skip or use these headers/footers wisely.

Notice that the file in question is **direct access**. There is noo header or footer in such file in common compilers. — Vladimir F Героям слава, Mar 16 '18 at 17:47

Reading a direct access fortran unformatted file in C++

2 Answers2

Linked