1

I need a help regarding CDO operation on a netcdf file. I downloaded dataset for 40 years from ERA5 over a grid region and I masked variable values for a range (30-50) to 1 and other values to 0 using cdo.

cdo -expr,'var2=var*(var>=30 && var<50)' data1.nc data2.nc

Now I want to calculate number of times each grid cell recorded var2= 1 consecutively for 5 days but less than 10 days in the last 40 years. Is that possible using cdo or nco?

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86

1 Answers1

2

First of all, I'm assuming your input has been converted to daily, you don't say.

Then you also need to clarify the question. Your title originally said "how to count the number of days?", but that was a bit ambiguous?

Let's say you have a series like this that represents an 8 day event:

0 0 1 1 1 1 1 1 1 1 0 0 0 0 

Does that count as a single occurrence? Your text seemed to imply this was the case, but the title not, I think you wanted to know the "number of events" not "days", so I edited your title to agree with the main text of the question, I hope this interpretation is correct.

I think you can do it but the solution is a bit longwinded. You can use runsum to give you a "1" for any day which is 1 and is on the end of a series of N days like this:

cdo gec,N -runsum,N in.nc out5.nc

But that doesn't totally answer your question. For example if N=5 this would convert the above series into this:

0 0 0 0 0 0 1 1 1 1 0 0 0 0 

i.e there are 4 days on the end of a 5 day series.

How can we get an upper limit to the length of an event? Well if we do the same calculation for >10 day events, and add together, we get

  1. not an event
  2. An event of at least 5 days but less than 10,
  3. An event that is >10 (and of course >5 days)

So we just add the two series and pick out the 1s to get the range of event lengths you require:

cdo gec,10 -runsum,10 in.nc out10.nc
# only keep events of 5,6,7,8 and 9 days in length:
cdo eqc,1 -add out5.nc out10.nc out5-10.nc 

Okay now we have a file where var=1 when it is at the end of a series of at least 5 but less than ten days.

Now this is cool part, we can apply the same technique using runmean/runsum to pick up the START and END of each of these series, and then we can add up these events. If we apply a runsum with a window size of 2, this produces 1 for a sequence of "0 1" or "1 0" i.e. it picks up the start and end points of each event.

cdo eqc,1 -runsum,2 out5-10.nc out_start_end.nc

This command turns our example series into the following, since we've seen only a sequence of "0 1" or "1 0" results in a 1:

0 0 0 0 0 0 1 0 0 1 0 0 0 0 

Now we just need to sum this in time and divide by 2 (I told you it was long winded!)

cdo divc,2 -timsum out_start_end.nc number_of_events.nc 

ta da!

Note 1 that if the whole input series ends mid-event e.g. 0 0 1 1 1 , this method will count this an a "half" event, since you only pick up the start. Round down to the nearest integer if this upsets you.

Putting this all together (and you can probably pipe to combine some of this), here is the whole solution involving 10 cdo commands summarized:

cdo gec,5 -runsum,5 in.nc out5.nc
cdo gec,10 -runsum,10 in.nc out10.nc
cdo eqc,1 -add out5.nc out10.nc out5-10.nc 
cdo eqc,1 -runsum,2 out5-10.nc out_start_end.nc
cdo divc,2 -timsum out_start_end.nc number_of_events.nc 

Note 2, the runsum commands will use the window mid-point for the date/timestamp, but that is not important for this use-case. If anyone wants to also use the outN.nc files to see when the event days are, then it is usual to lag the time stamp using --timestat_date last, see this video for more details.

Note 3 If you sum the series of days within the events, you can now divide this by the number of events to get the mean event length.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • 1
    Good answer, given the question is unclear. Does this fully distinguish between events? If you had a 9 day event, would it not count that as 4 events? Question title implies it should be `ge,5`, though question text implies it should be `gec,5` – Robert Wilson Jan 27 '23 at 09:39
  • nope, that's the beauty of the " eqc,1 -runsum,2" statement, it only gives you a 1 if the input is 0 1 or 1 0, i.e. the start and end of the sequence. So a 9 day event gives you 0 1 1 1 1 1 0 and this turns it into 0 1 0 0 0 1 0 , which i then timesum and divide by two to count events. if the series ends with 1s , you get half an event, but let's not split hairs, (maybe I will add something to round it down). Took me a while to think of this, a bit long winded, but pretty cool as a CDO case study. Thanks for the < versus <= you are right, also that mean 9 day=5 – ClimateUnboxed Jan 27 '23 at 12:02
  • Actually to be completely rigorous I should have written that eqc,1 -runsum,2 turns that example sequence into 1 0 0 0 1 0. For this particular use case, I just realized in fact that the timestat_date is actually superfluous. – ClimateUnboxed Jan 27 '23 at 13:17
  • Ah, I get it now. I guess --time_stat would only be useful if you needed to calculate it per year. The problem reminds me of using CDO to analyse marine heat waves, which is even tricker. – Robert Wilson Jan 27 '23 at 13:24
  • Sounds interesting, if you are open to it, I'll contact you offline from here to see if you are open to giving a talk on the topic in one of our online ICTP lecture series later in the year. :-) – ClimateUnboxed Jan 27 '23 at 13:28
  • Just for a small project. Contact here: https://nctoolkit.readthedocs.io/en/latest/info.html – Robert Wilson Jan 27 '23 at 14:33
  • no need to apologise, hope the answer is helpful, please consider accepting it with the tick if you think it resolves your question :-) – ClimateUnboxed Jan 29 '23 at 16:16