1

I am trying to read a binary file in Python using a Fortran code as an example.

The binary file is called data.grads and I have a control file called data.ctl that should allow me to understand how to read the binary file. As I said, I have a Fortran code which my advisor wrote to explain me the process of reading the binary file and constructing arrays corresponding to the different variables inside the control file (speed, temperature, pressure, etc).

I read multiple posts on the matter and I am having some trouble understanding how to use the data inside the binary file.

Here are some posts that I checked:

The binary file stores simulations results of different properties of the atmosphere that scientists use to plot, for instance, temperature and pressure sections such as these:

enter image description here

I was told that the control file is key to understanding how to get anything out of the binary file.

I am able to read the file, but I don’t know how to access results for a specific variable. Here is a piece of code that I was able to derive from some of the posts that I quoted before:

filename = "/path/data.grads"

nlat = 32
nlon = 67
f = open(filename, 'rb')
recl = np.fromfile(f, dtype='int32', count=4*nlat*nlon)
f.seek(4)
field = np.fromfile(f, dtype='float32',count=-1)

print('Record length=',recl)
print(field, len(field))

It returns the following:

Record length= [-134855229 -134855229 -134855229 ... -134855229 -134855229 -134855229]
[-9.99e+33 -9.99e+33 -9.99e+33 ... -9.99e+33 -9.99e+33 -9.99e+33] 10319462399

Can someone who has experience in the subject help me figure out how to access the different variables using the control file ?

If you need more explanations, please tell me, I will edit my posts and add as much information as you want.


Unfortunately, I cannot share the binary file because it is on a server and weighs about 40 GB…

I share:

  • the content of the control file,
  • and the Fortran code (it lacks some code because it was written as an explanation).

Control file (data.ctl)

DSET ^data.grads
UNDEF -9.99e33
XDEF  64 LINEAR 0.0   5.6250
YDEF 32 LEVELS
 -85.761 -80.269 -74.745 -69.213 -63.679 -58.143 -52.607 -47.070 -41.532
 -35.995 -30.458 -24.920 -19.382 -13.844  -8.307  -2.769   2.769   8.307
  13.844  19.382  24.920  30.458  35.995  41.532  47.070  52.607  58.143
  63.679  69.213  74.745  80.269  85.761
ZDEF  67 LEVELS
.000696 .08558 .1705 .2554 .3402 .4544 .6274 .8599 1.152 1.505 1.918 2.392
2.928 3.495 4.063 4.781 5.816 7.181 8.889 10.95 13.39 16.07 18.81 21.61
24.47 27.39 30.37 33.40 36.50 39.64 42.77 45.90 49.04 52.17 55.31 58.44
61.57 64.71 67.84 70.98 74.11 77.24 80.38 83.51 86.65 89.78 92.92 96.05
99.18 102.3 105.5 108.6 111.7 114.9 118.0 121.1 124.3 127.4 130.5 133.7
136.8 139.9 143.1 146.2 149.3 152.5 160.0
TDEF 3120 LINEAR 01JAN2000 1HR
VARS 31
u    67 99  u (m/s)
v    67 99  v (m/s)
w    67 99  w (m/s)
T    67 99  T (K)
dia  67 99  diagnostics (see table)
ps    0 99  ps
Ts    0 99  Ts (K)
h2og   67 99   Water vapor [kg/m^2]
h2oim1  67 99   Water ice mass for dust 0.30E-07 [kg/m^2]
h2oim2  67 99   Water ice mass for dust 0.10E-06 [kg/m^2]
h2oim3  67 99   Water ice mass for dust 0.30E-06 [kg/m^2]
h2oim4  67 99   Water ice mass for dust 0.10E-05 [kg/m^2]
h2oin1  67 99   Water ice number for dust 0.30E-07 [number/m^2]
h2oin2  67 99   Water ice number for dust 0.10E-06 [number/m^2]
h2oin3  67 99   Water ice number for dust 0.30E-06 [number/m^2]
h2oin4  67 99   Water ice number for dust 0.10E-05 [number/m^2]
h2ois  0 99  surface h2o ice [kg/m^2]
p  67 99   Pressure [Pa]
h  67 99   Height above the surface [m]
dipre  67 99   Delta pressure [Pa]
surf   67 99   space of cell factor [sin*cos]
dm   67 99  CO2 ice cloud mass concentration [kg/m^3]
cap   0 99  Surface CO2 ice mass [kg]
hcap  0 99  Surface CO2 ice [kg/m^2]
gdq_op 0 99  dust from rad.mod [opacity]
gdq_mix  67 99  dust from rad.mod [mix.r]
dust_op 0 99  dust from tracers [opacity]
dust_n1  67 99   Dust num dens from tracers for R=0.30E-07 [m^-3]
dust_n2  67 99   Dust num dens from tracers for R=0.10E-06 [m^-3]
dust_n3  67 99   Dust num dens from tracers for R=0.30E-06 [m^-3]
dust_n4  67 99   Dust num dens from tracers for R=0.10E-05 [m^-3]
ENDVARS

Fortran file

  open(12,file='data.grads',status='unknown',
 &           form='unformatted',access='direct',
 &           recl = 4*nlat*nlon )


  krec=0

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(u3d(1:nlon,1:nlat,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(v3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(w3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(T3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(D3(:,:,l))
  enddo

  krec = krec+1
  write(12,rec=krec,ERR=900)real(ps3d(:,:))

  krec = krec+1
  write(12,rec=krec,ERR=900)real(Ts3d(:,:))

  do n = 1,4
   do l = 1,nlat
    krec = krec+1
    write(12,rec=krec,ERR=900)real(trace4D(:,:,l,n))
   enddo
  enddo

  do n = 1,4
    krec = krec+1
    write(12,rec=krec,ERR=900)real(trace2D(:,:,n))
  enddo

  do l = nlat,1,-1
     krec = krec+1
     write(12,rec=krec,ERR=900)real(pre3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(alth3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(dipre3d(:,:,l))
  enddo


  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(surf3d(:,:,l))
  enddo

  do l = 1,nlat
     krec = krec+1
! Convert DM to DM/grid volume
     write(12,rec=krec,ERR=900) real(DM4(:,latmask(:),l)
 &         *GRAV*pre3d(:,:,nlat-l+1)
 &         /(T3d(:,:,nlat-l+1)*dipre3d(:,:,nlat-l+1)*RGAS)
 &         /surf3d(:,:,nlat-l+1) )
  enddo

  krec = krec+1
  write(12,rec=krec,ERR=900)real(CAP4(:,latmask(:)))

  krec = krec+1
  write(12,rec=krec,ERR=900)real(HCAP4(:,latmask(:)))

  krec = krec+1
  write(12,rec=krec,ERR=900)real(DDUSTA3d(:,:))

  do l = 1,nlat
     krec = krec+1
     write(12,rec=krec,ERR=900)real(GDQ3d(:,:,l))
  enddo

  krec = krec+1
  write(12,rec=krec,ERR=900)real(tau_dust3d(:,:))

  do n = 1,4
   do l = 1,nlat
    krec = krec+1
    write(12,rec=krec,ERR=900)real(dust_n3d(:,:,l,n))
   enddo
  enddo

EDIT: First lines of the binary file (xxd -l 100 data.grads)

0000000: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000010: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000020: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000030: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000040: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000050: c345 f6f7 c345 f6f7 c345 f6f7 c345 f6f7  .E...E...E...E..
0000060: c345 f6f7                                .E..
Loïc Poncin
  • 511
  • 1
  • 11
  • 30
  • Which compiler are you using? Specifying ` recl = 4*nlat*nlon` is not portable, some some compilers count record sizes in bytes, other in 4-byte words. https://stackoverflow.com/questions/37770912/why-direct-access-i-o-works-incorrectly-with-intel-visual-fortran – Vladimir F Героям слава Jul 18 '19 at 13:21
  • In Python, I use IPython with Spyder from Anaconda but in Fortran I don't know what kind of compiler they use... I was just given the Fortran code but I am proficient yet in Fortran so I was asked to code in Python. Here is the version of python that I use: Python 3.7.3. – Loïc Poncin Jul 18 '19 at 13:26
  • EDIT: "I am **not** proficient yet in Fortran" – Loïc Poncin Jul 18 '19 at 13:35
  • Another consideration, can't the file be big-endian? – Vladimir F Героям слава Jul 18 '19 at 13:45
  • @VladimirF I opened the file using this command in the terminal `xxd -b data.grads` and I got to see that every line inside is written like: `44s54c5: 10010101 11001010 01010101 00101010 10101010 11011010 =.6^E.` (this is not a line from the file, just an example, the file is so big that I can't copy any line because it is still reading). https://unix.stackexchange.com/questions/282215/how-to-view-a-binary-file – Loïc Poncin Jul 18 '19 at 13:56
  • Show a hexdump (not one bit per character!) of the *beginning* of the file, in text (not a screenshot). – Davis Herring Jul 18 '19 at 20:29
  • @LoïcPoncin: There are lots of similar commands and I don’t know that one. I always use `od -t x1 data.grads | head -n 20` or so. – Davis Herring Jul 18 '19 at 21:05
  • @DavisHerring I edited my post to add the first lines of the binary code – Loïc Poncin Jul 19 '19 at 11:31
  • @LoïcPoncin: Well, that’s a boring dump, but it does conveniently establish that there’s no header. – Davis Herring Jul 19 '19 at 11:36

2 Answers2

3

I've recently written a python package xgrads to parse and read the ctl/binary files commonly used by GrADS. It can load the whole data set using numpy and dask, and return it as a xarray.Dataset, which is very popular in earth science.

The following code shows how to load the ctl dataset and access the different variables you are interested in:

from xgrads.core import open_CtlDataset

dset = open_CtlDataset('test.ctl')

# print all the info in ctl file
print(dset)

# for your ctl content, you can plot any variables (e.g.,
# first time and level of T) immediately as
dset['T'][0,0,...].plot()

Although this is the first version, no widely tested, I've parsed your ctl content successfully. I hope the package can also return you the correct dataset. If you find any bugs, I'm also glad to fix that.

MiniUFO
  • 171
  • 1
  • 12
1

The numpy equivalent (or rather counterpart, since you showed the writer) of that Fortran begins

import numpy
nlat=32  # from data.ctl
nlon=64
nz=64    # or 67?
dt=numpy.dtype((numpy.float32,(nlat,nlon)))
def read(n=nz): return numpy.fromfile(f,dt,n)
def read1(): return read(1)[0]
with open("data.grads",'rb') as f:
  u3d=read()
  v3d=read()
  # …
  ps3d=read1()
  # …
  trace4D=numpy.fromfile(f,numpy.dtype((dt,nlat)),4)
  trace2D=read(4)
  pre3d=read()[::-1]
  # …

There is some confusion between the nlat and nz variables; since you said the code was altered, the data.ctl might be the better reference.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • This is great, it seems that I am getting good results with that code. Can you tell me how you would read the following variables `trace2D`, `DM4`, `CAP4`, `HCAP4` according to the Fortran code ? Also, in the control file you can see this line related to time `TDEF 3120 LINEAR 01JAN2000 1HR`. It says that the simulation start the 1st of January 2000 and continues for 3120 time steps of 1 hour (3120 hours). So to get the results for every hour, shoud I do `with open("data.grads",'rb') as f: for t in range(3120): u3d=read() # …` ? – Loïc Poncin Jul 19 '19 at 14:09
  • @LoïcPoncin: I added a couple—some can’t be reconstructed in general because `latmask` discards information. That loop seems plausible if it’s consistent with the file size, although it depends on whether you want to make a `dict` (or `namedtuple`) or just process the data one time step at a time without retaining it all. – Davis Herring Jul 19 '19 at 15:59
  • So I guess I could do something like this if I were to use a dictionnary for the temperature `temp = {} with open('data.grads','rb') as f: for t in range(3120): T3d=read() temp['t_{}'.format(t)] = T3d # …` – Loïc Poncin Jul 19 '19 at 19:07
  • 1
    @LoïcPoncin: You could, but why not just use `t` as the key—or just use a list (whether AoS or SoA)? – Davis Herring Jul 19 '19 at 19:42
  • SoA seems better for what I want to do, thanks a lot for your help. Cheers ! – Loïc Poncin Jul 19 '19 at 19:46