0

I've been trying to read in a binary file that was apparently created using Fortran code. It's in .j03 format which is claimed to be some "old Fortran format". However, I am unable to read this thing into either Python or MATLAB. The file in question is here. From the source of the file:

*The files are written in the standard BINARY FORTRAN UNFORMATTED fashion, and the Byte ordering of each 4-Byte word is LITTLE ENDIAN.

If you try to read these files in MATLAB or C, take into account that IFORT adds 4 bytes at the end and beginning of each record.*

I've already tried multiple approaches in Python - reading in with np.fromfile or with open by stepping every 4 bytes and trying to use decode (which tells me that it's apparently ASCII). No idea how to proceed from here. If anyone can decode that file or help me in doing so, I'd be grateful... accessing the file contents is way more important to me than the technique for opening it. So any bit of information that I can immediately use will help.

EDIT: The first couple of rows should look like this.

variables= "x","z","uu","vv","ww","uv","oxox","oyoy", "ozoz" 
 zone  i=         128 , j=          85 ,f=point
     7012.0693359375     2337.3564453125        0.0006860329        0.0000001922        0.0001566301       -0.0000022967        0.0212685466        0.0001820315        0.1483741105
     3506.0346679688     2337.3564453125        0.0013318420        0.0000009303        0.0003984132       -0.0000067950        0.0487244353        0.0004225464        0.2540639639
     2337.3564453125     2337.3564453125        0.0016939737        0.0000021201        0.0004831004       -0.0000040161        0.0822693184        0.0006190896        0.3612477779
     1753.0173339844     2337.3564453125        0.0016179247        0.0000034364        0.0006777456        0.0000034771        0.1159339100        0.0008106859        0.4871017039
     1402.4138183594     2337.3564453125        0.0017362968        0.0000055227        0.0006670614        0.0000114476        0.1455238760        0.0010049801        0.5012307167
     1168.6782226562     2337.3564453125        0.0016870550        0.0000076339        0.0006272307        0.0000261369        0.1872732490        0.0011485637        0.6453682184
  • 1
    Have a look here https://stackoverflow.com/a/60296072/2836621 – Mark Setchell Jun 14 '23 at 17:37
  • Do you know what the data in shared file are actually supposed to look like, or correspond to? – Mark Setchell Jun 14 '23 at 17:37
  • Yes, actually. Here's what the output should presumably look like: https://wsi.li/dl/EZKwjTegHGr5a3dKm/. – gettingmathy Jun 14 '23 at 17:56
  • If you have it in a text form, add some excerpt into the question using an [edit]. Do not post links to some external hostings. – Vladimir F Героям слава Jun 14 '23 at 21:45
  • Done @VladimirFГероямслава – gettingmathy Jun 15 '23 at 01:04
  • Have you looked at this file with a hexdump? That should have been job one. There is obviously no ASCII, and I don't see any floating point values. It has a suspicious number of C2 and C3 values. Did you, perhaps, run this through a UTF-8 encoding and save the (now garbage) encoded result? – Tim Roberts Jun 15 '23 at 01:18
  • 2
    Yes, that's what happened. You UTF-8 encoded this binary file, thereby trashing the data. If I undo that encoding, I can now see reasonable IEEE floating point numbers in the data. You need to go back and fine the original file, before it was UTF-8-encoded. But, be aware that the file doesn't contains ANY indication of what its contents are. You need to KNOW that there's one float followed by 12 integers followed by 240 floats, or whatever. – Tim Roberts Jun 15 '23 at 01:25
  • Hmm, I don't recall doing that but I can get the original file again, that's not an issue. Would you mind writing a comment on what you did to get those numbers? I'm not terribly amazing at this but I hexdump'd the file as you said and got what I'm assuming to be hex... not sure how to proceed from there? – gettingmathy Jun 15 '23 at 01:34
  • Just wanted to comment on your edit; unfortunately, I am not aware of the way the contents are arranged either. I am obtaining these from here https://torroja.dmt.upm.es/channels/data/spectra/re180/2D/, and they are rather vague in their description. I also know what the file should look like; the edit in my original post gives the first few rows and after that, it should all be rows of floats... @TimRoberts – gettingmathy Jun 15 '23 at 01:49
  • OK, so those first 4 floats in my dump are time, re, alp, and bet. You can see in my output that `nvar` is 7, which agrees with the doc. So, given that doc, you ought to be able to extract the data. – Tim Roberts Jun 15 '23 at 02:07
  • Aah, okay yes I see what's happening now, will test it out. Thanks a lot for the help on this! – gettingmathy Jun 15 '23 at 03:37

2 Answers2

1

Decrypting random binary files happens to be one of my programming strengths ;). So, I did this to try to fix the file:

xin = open('Re180.spe.j03','rb').read()
x2 = xin.decode('utf-8')
x3 = bytes([ord(c) for c in x2])
open('fix.j03','wb').write(x3)

Now, I see reasonable data:

timr@Tims-NUC:~/Downloads$ hexdump -C fix.j03 |more
00000000  2c 00 00 00 1d 6b 6a 3d  00 20 4b 45 ab aa 2a 3e  |,....kj=. KE..*>|
00000010  00 00 00 3f 00 02 00 00  61 00 00 00 53 01 00 00  |...?....a...S...|
00000020  09 00 00 00 f0 19 00 00  03 00 00 00 07 00 00 00  |................|
00000030  2c 00 00 00 48 00 00 00  08 00 00 00 0b 00 00 00  |,...H...........|
00000040  0e 00 00 00 11 00 00 00  16 00 00 00 1b 00 00 00  |................|
00000050  21 00 00 00 28 00 00 00  31 00 00 00 e0 ff d5 3c  |!...(...1......<|
00000060  c1 5f 59 3d 00 8d b6 3d  a1 30 09 3e f4 6f 68 3e  |._Y=...=.0.>.oh>|
00000070  3a 6a ae 3e 00 00 00 3f  e8 af 35 3f 00 00 80 3f  |:j.>...?..5?...?|
00000080  48 00 00 00 00 a8 02 00  d8 87 2f 45 ff 6a ad 3d  |H........./E.j.=|
00000090  88 5c a6 3d 1b c0 7e 3d  89 4c 45 3d 9f 27 3a 3d  |.\.=..~=.LE=.':=|
000000a0  d5 4d 0f 3d 1b aa 07 3d  37 54 da 3c 9f 13 cc 3c  |.M.=...=7T.<...<|
000000b0  95 a7 c1 3c 85 2f a3 3c  97 87 ac 3c fa 41 97 3c  |...<./.<...<.A.<|
000000c0  6b 73 8c 3c 7c 70 74 3c  b4 ef 61 3c 01 28 5e 3c  |ks.<|pt<..a<.(^<|

Any time you see regular sections of 4-byte entries ending with 3c, 3d, 3f and 40, those are almost certainly single-precision floats.

So, now I can read with Python:

>>> x = open('fix.j03','wb').read()
>>> z1 = x.read(256)
>>> z2 = struct.unpack('i4f18i8f2i31f', z1)
>>> z2
(44, 0.057231057435274124, 3250.0, 0.1666666716337204, 0.5, 512, 97, 339, 9, 6640, 3, 7, 44, 72, 8, 11, 14, 17, 22, 27, 33, 40, 49, 0.026122987270355225, 0.0530698336660862, 0.08913612365722656, 0.1339745670557022, 0.22698956727981567, 0.340654194355011, 0.5, 0.7097153663635254, 1065353216, 72, 2.4393803666966416e-40, 2808.490234375, 0.08467673510313034, 0.08123117685317993, 0.06219492480158806, 0.048168692737817764, 0.0454479418694973, 0.034986335784196854, 0.03312120959162712, 0.02665148489177227, 0.024911699816584587, 0.023639479652047157, 0.019920120015740395, 0.021060748025774956, 0.01846407726407051, 0.017144879326224327, 0.014919396489858627, 0.013790059834718704, 0.013559342361986637, 0.012559246271848679, 0.012359904125332832, 0.012537372298538685, 0.01048948522657156, 0.010360710322856903, 0.009192272089421749, 0.0085222739726305, 0.007902493700385094, 0.007303152699023485, 0.007056804373860359, 0.006536118220537901, 0.00580992316827178)
>>> 

So there is some hope for you, but it's not going to be an easy haul.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • Thanks, I'll try this and update. Just wondering, what is the string argument you passed to unpack? Does i4f18, for example, mean 4 integers followed by 18 floats? – gettingmathy Jun 15 '23 at 01:47
  • No, other way around. 1 integer, 4 floats, 18 ints, 8 floats, 2 ints, 31 floats. That's what I see at the beginning of the file. `struct` is the Rosetta stone for those trying to crack random binary files. – Tim Roberts Jun 15 '23 at 02:01
0

You are making life too difficult.

Go back to the original file, which is here: https://torroja.dmt.upm.es/channels/data/spectra/re180/2D/

Then you can see the format.

I think you would be best just converting it to an Ascii (text) file first, then you can do what you like with it in another programming language.

The following code writes to screen, but you can change the output unit to direct to a file, or simply redirect your output to a file. Note that I have added a number of write( out, * ) "-------" which is purely so that I can see where the records are. You will probably want to remove these.

program test
   implicit none

   real utau, re, alp, bet
   integer mx, my, mz, nplan, nacum, jind, nvar
   integer nx1, nz1

   integer, allocatable :: jsp(:)
   real, allocatable :: yh(:)
   real, allocatable :: sp(:,:)

   integer :: out = 6
   integer nv, j

   ! Record 1
   open( 10, file="Re180.spe.j03", form="unformatted" )
   read( 10 ) utau, re, alp, bet, mx, my, mz, nplan, nacum, jind, nvar
   write( out, * ) utau, re, alp, bet, mx, my, mz, nplan, nacum, jind, nvar
   write( out, * ) "---------"

   ! Record 2
   allocate( jsp(nplan), yh(nplan) )
   read( 10 ) ( jsp(j), j = 1, nplan ), ( yh(j), j = 1, nplan )
   write( out, * ) jsp, yh
   write( out, * ) "---------"

   ! Records 2+1 to 2+nvar
   nx1 = mx / 2;   nz1 = ( mz + 1 ) / 2
   allocate( sp(nx1,nz1) )
   do nv = 1, nvar
      read( 10 ) sp
      write( out, * ) sp
      write( out, * ) "---------"
   end do

   close( 10 )

end program test

Output:

   5.72310574E-02   3250.00000      0.166666672      0.500000000             512          97         339           9        6640           3           7
 ---------
           8          11          14          17          22          27          33          40          49   2.61229873E-02   5.30698337E-02   8.91361237E-02  0.133974567      0.226989567      0.340654194      0.500000000      0.709715366       1.00000000    
 ---------
   2808.49023         5.72824106E-03   5.54950256E-03   5.64759970E-03   5.63186780E-03   .............. < Lots of other data >
 ---------
... <Another 7-1 very large records>
----------
lastchance
  • 1,436
  • 1
  • 3
  • 12