1

I am working with ERA5 data (daily temperature of 30 years) to identify heatwaves (HW). A HW event is defined as at least 3 consecutive days exceeding the threshold. I have calculated EHF, i.e. Excess Heat Factor and the threshold for HW is set to >zero. Now, I want to tag unique id number to the HW events, e.g. first HW event would have one (1) for associated days, then 2nd HW event would have two (2) for the associated days, and stored it in another variable (hw_num).

I am trying to implement something similar to the following.

# create dataArray
da = xr.DataArray(
    [1,0,0,1,1,1,1,0,1,1,1, 0],
    coords=[
        pd.date_range(
            "1999-12-15",
            periods=12,
            freq=pd.DateOffset(days=1),
        )
    ],
    dims="time",)

m = da.rolling(time=2).sum() # The 2nd consecutive day gets value higher than 2

j = m.copy()  # Dummy array to hold the HW event id
c = 0  # Count to increment the HW event number
# Loops to id HW events
for i in range(len(m)):
    j[i] = np.nan
    if m[i] >1:
        j[i-1] = c
        j[i]=c
        j[i+1]=c
    if m[i] == 1:
        c += 1


The above code works for a single location. But, my data is a grid, containing multiple lat-lon values for 30 years (time dimension)

**Things I have done so far: **

I have filtered the xarray dataset, ds4, for the days with EHF higher than zero. then created a mask and calculated moving/rolling sum of two days on the mask (containing true and false). Then I ran a loop to access each location (lat-lon pair) and then check the rolling sum on time dimension. If rolling sum is higher than 1 for specific time (day), I added the id number to the day and the preceding day. Then when the rolling sum reaches 1, I add the increment to the id number (count). Following is the code:

ds4 = ds4.where(ds4['ehf']>0, drop=True)
ds4['mask'] = ds4['ehf']>0
ds4['m_roll'] = ds4['mask'].rolling(time=2).sum()
ds4['hw_num'] = ds4['m_roll']

# Loops to add unique number to heatwave events
for lats in range(ds4.dims['latitude']):
    for lons in range(ds4.dims['longitude']):
        count = 1
        for t in range(1, ds4.dims['time']):
            ds4['hw_num'][t, lats, lons] = np.nan  # assigns nan values to 
            if ds4['m_roll'][t, lats, lons] > 1:
                ds4['hw_num'][t-1:t+1, lats, lons] = count
            if ds4['m_roll'][t, lats, lons] == 1:
                count += 1

However, the resulting hw_num variable is empty and is not doing what I intended to do. Where am I getting wrong and how to improve it? TIA

Salit
  • 11
  • 5

0 Answers0