4

I have 34 netCDF (nc) files containing latitude, longitude, and the data in every file. Every filename containing a number which corresponds to the Pressure level in hPa (starts from 1 to 34 for respective pressure level from (1000 hPa to 0.4 hPa). I want to join all files into a single nc file with this vertical level dimension information.

I tried to read whole files using xarray open_mfdataset but I cannot con_cat with the level dimension since it isn't in the files.

import xarray as xr
ds = xr.open_mfdataset('/media/MediaCentre/Dataset/d9/data*.nc',concat_dim='level')

The files do not have any information in the global attributes regarding the pressure. They are names sequentially: data1.nc, data2.nc, ... dataN.nc and correspond to the pressure levels (hPa) of: 1000 975 950 925 900 850 800 750 700 650 600 550 500 450 400 350 300 *250 200 150 100 70 50 40 30 20 15 10 7 5 3 2 1 0.4

How can I merge these together using python xarray, or cdo/nco?

Sample data are here https://www.dropbox.com/sh/linfn0721ze3j1f/AACwxTsQVNyiE7mF_gRbpRfra?dl=0

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Krishnaap
  • 297
  • 3
  • 18
  • Can I ask, in the files themselves there is no data entry (dimension, variable or global attribute) with the pressure level recorded? i.e. one would like get the pressure from the file name and add a new dimension in the merged file? – ClimateUnboxed Sep 22 '20 at 15:12
  • @AdrianTompkins Yes, the file doesn't contain pressure level information, it just contains latitude, longitude, and the data. But it has pressure level information given in the file name (like 1 represents the surface level 1000 hPa, 2 represents 975 hPa like that). – Krishnaap Sep 23 '20 at 07:44
  • ps: from your reply to Robert Wiilson, it seems you are not tied to a python solution, but are happy to have an nco/cdo alternative. If so I may edit the question to reflect this, as some people downvote when a question is asked with a certain language specified and then people upload answers with an alternative language. – ClimateUnboxed Sep 23 '20 at 19:22
  • @AdrianTompkins I would really appreciate your suggestion and edits. Thanks. – Krishnaap Sep 26 '20 at 18:30

3 Answers3

3

It is probably easier to do this using CDO. The following will merge the two sample files you have supplied:

cdo -L -merge -setlevel,0.4 data1.nc -setlevel,1 data2.nc merged.nc

Just modify the above to be able to handle all of the files.

Robert Wilson
  • 3,192
  • 11
  • 19
  • Thanks @robert-wilson, actually I tried with CDO, it was stacking over the level(dimension) surface. That's why I tried to do with xarray. – Krishnaap Sep 21 '20 at 11:32
  • @rober-wilson actually CDO worked in this way well, `cdo -L -merge -setlevel,1000 data1.nc -setlevel,975 data2.nc ... ...-setlevel,1 data33.nc -setlevel,0.4 data34.nc merged.nc` but when I cross-checked the data in GrADS, when I set z 1 it showing 0.4 hPa data. Similarly 'set z 34` represents 1000 hPa data. That is making my analysis difficult because normally z 1 represents 1000 hPa. I am not getting the reason though, since it specifically assigns -setlevel 1000 for 1000 hPa data. May be in contradiction with CDO -setlevel saving? I am not sure – Krishnaap Sep 26 '20 at 18:15
3

This is a BASH script which defines a list of pressure levels (look at the comments) and then the workhorse is ncap2, which is used to add a dimension called "level" to each file, and then define a variable "level" with the pressure value defined. I then use ncatted to add attributes for completeness such as the pressure units.

cdo is then used at the end to merge the files (nco could alternatively be used). The example here only merges two files, but you can merge all your .

#!/bin/bash

# define pressure levels here, start, inc, end or just a list
# this is important to get right, the number of files processed
# depends on this list, here I only process 2 files,
# data1.nc = 1000hPa, data2.nc=975 hPa

# this was my test merging two files, can use seq if p levs are regular
# p_levs=($(seq 1000 -25 975))

# this is the full code:
p_levs=(1000 975 950 925 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 70 50 40 30 20 15 10 7 5 3 2 1 0.4)
    
# loop over the pressure levels, data1.nc-> first pressure level, data2-> second etc
for i in ${!p_levs[@]} ; do
    infile=data$(expr $i + 1).nc # nc filename

    # add a level dimension and define variable with pressure level:
    ncap2 -O -s 'defdim("level",1);q[level,lat,lon]=q;level=array('${p_levs[$i]}',10,$level)' $infile tmp${i}_$infile

    # put in attributes for the new level variable:
    ncatted -h -a units,"level",o,c,"hPa" tmp${i}_$infile
    ncatted -h -a long_name,"level",o,c,"Pressure level" tmp${i}_$infile
done

# merge the files:
cdo merge tmp*.nc merged.nc

# clean up tmp files at end 
rm -f tmp*.nc 

I tested it using p_levs="1000 975" and it merged the first two of your files data1.nc and data2.nc without problems and the resulting file opens okay in ncview and looks fine.

NOTE: For some reason ncap2 is removing the attributes from "q" as per the comments on this post , I'm not sure why, so it means you will need to readd those with ncatted. This answer here from Charlie Zender was also helpful to construct this code.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • This worked but there was an issue, I cross-checked with Panoply and GrADS. When I tried this method, in the sequence vertical levels I could see in the ouyput file as follows; 1000 hPa file ( as in data1.nc) then 600 hPa (as in data11.nc), followed by 550 hPa (as in data12.nc). The for loop merging the data in the above method is not in the sequential I think. Thanks a lot for your great scrip by the way – Krishnaap Sep 26 '20 at 18:26
  • 1
    From this and the answers you give to Robert and Charlie, then I think your files are not in pressure sequential order. The loop increases i by 1 each time, so much consider files in the the order file1, file2, ... file N, (you can put an echo $file and echo ${p_levs[$i]} inside the loop to print the filename and pressure used to see what the corresponding name and pressure is that is used) and the pressure is in order too, it is not like using a wildcard order such as ls file* - Check the individual files carefully to make sure they are in pressure order. – ClimateUnboxed Sep 26 '20 at 19:57
3

A different approach with NCO would first combine the levels with ncecat, e.g.,

ncecat -u level in*.nc out1.nc

and then add the level coordinate with ncap2, e.g.,

ncap2 -O -s 'level[$level]={1000, 975, ... 0.4}' out1.nc out2.nc

and then add the attributes with ncatted as Adrian shows.

ncatted <Adrian's example> out2.nc out3.nc

Good luck, Charlie

Charlie Zender
  • 5,929
  • 14
  • 19
  • 2
    Charlie always manages to write a nco solution that is 4 times shorter and 10 times neater than mine ;-) – ClimateUnboxed Sep 24 '20 at 07:13
  • for some reasons, `ncecat -u level data*.nc out1.nc` took data10.nc as the first file. Hence it put 650 hPa data as the first level instead of 1000 hPa data ( as it in data1.nc). – Krishnaap Sep 26 '20 at 18:52
  • The wildcard operator (the asterisk) expands files in alphabetic order. data10.nc must precede data1.nc, or something like that. Name your files so they sort alphabetically in the correct order, or eliminate the wildcard and manually place the files in the correct order on the command line. – Charlie Zender Sep 27 '20 at 16:39