4

I have hourly data from ERA5 for each day in a specific year. I want to convert that data from hourly to daily. I know the long and hard way to do it, but I need something which does that easily.

Copernicus has a code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation, which works fine if the data set is only converted for one day, but when converting for the whole year, i am having problems with that.

Link to download ERA5 dataset which is available at https://cds.climate.copernicus.eu/cdsapp#!/home

Follow the steps to use copernicus server here

https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5

This script downloads the houly data for only 2 days (1st and 2nd of January 2017):
#!/usr/bin/env python
"""
Save as get-tp.py, then run "python get-tp.py".
  
Input file : None
Output file: tp_20170101-20170102.nc
"""
import cdsapi
 
c = cdsapi.Client()
r = c.retrieve(
    'reanalysis-era5-single-levels', {
            'variable'    : 'total_precipitation',
            'product_type': 'reanalysis',
            'year'        : '2017',
            'month'       : '01',
            'day'         : ['01', '02'],
            'time'        : [
                '00:00','01:00','02:00',
                '03:00','04:00','05:00',
                '06:00','07:00','08:00',
                '09:00','10:00','11:00',
                '12:00','13:00','14:00',
                '15:00','16:00','17:00',
                '18:00','19:00','20:00',
                '21:00','22:00','23:00'
            ],
            'format'      : 'netcdf'
    })
r.download('tp_20170101-20170102.nc')
## Add multiple days and multiple months to donload more data
Below script will create a netCDF file for only one day
#!/usr/bin/env python
"""
Save as file calculate-daily-tp.py and run "python calculate-daily-tp.py".
  
Input file : tp_20170101-20170102.nc
Output file: daily-tp_20170101.nc
"""
import time, sys
from datetime import datetime, timedelta
 
from netCDF4 import Dataset, date2num, num2date
import numpy as np
 
day = 20170101
d = datetime.strptime(str(day), '%Y%m%d')
f_in = 'tp_%d-%s.nc' % (day, (d + timedelta(days = 1)).strftime('%Y%m%d'))
f_out = 'daily-tp_%d.nc' % day
 
time_needed = []
for i in range(1, 25):
    time_needed.append(d + timedelta(hours = i))
 
with Dataset(f_in) as ds_src:
    var_time = ds_src.variables['time']
    time_avail = num2date(var_time[:], var_time.units,
            calendar = var_time.calendar)
 
    indices = []
    for tm in time_needed:
        a = np.where(time_avail == tm)[0]
        if len(a) == 0:
            sys.stderr.write('Error: precipitation data is missing/incomplete - %s!\n'
                    % tm.strftime('%Y%m%d %H:%M:%S'))
            sys.exit(200)
        else:
            print('Found %s' % tm.strftime('%Y%m%d %H:%M:%S'))
            indices.append(a[0])
 
    var_tp = ds_src.variables['tp']
    tp_values_set = False
    for idx in indices:
        if not tp_values_set:
            data = var_tp[idx, :, :]
            tp_values_set = True
        else:
            data += var_tp[idx, :, :]
         
    with Dataset(f_out, mode = 'w', format = 'NETCDF3_64BIT_OFFSET') as ds_dest:
        # Dimensions
        for name in ['latitude', 'longitude']:
            dim_src = ds_src.dimensions[name]
            ds_dest.createDimension(name, dim_src.size)
            var_src = ds_src.variables[name]
            var_dest = ds_dest.createVariable(name, var_src.datatype, (name,))
            var_dest[:] = var_src[:]
            var_dest.setncattr('units', var_src.units)
            var_dest.setncattr('long_name', var_src.long_name)
 
        ds_dest.createDimension('time', None)
        var = ds_dest.createVariable('time', np.int32, ('time',))
        time_units = 'hours since 1900-01-01 00:00:00'
        time_cal = 'gregorian'
        var[:] = date2num([d], units = time_units, calendar = time_cal)
        var.setncattr('units', time_units)
        var.setncattr('long_name', 'time')
        var.setncattr('calendar', time_cal)
 
        # Variables
        var = ds_dest.createVariable(var_tp.name, np.double, var_tp.dimensions)
        var[0, :, :] = data
        var.setncattr('units', var_tp.units)
        var.setncattr('long_name', var_tp.long_name)
 
        # Attributes
        ds_dest.setncattr('Conventions', 'CF-1.6')
        ds_dest.setncattr('history', '%s %s'
                % (datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                ' '.join(time.tzname)))
 
        print('Done! Daily total precipitation saved in %s' % f_out)

What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.

The result should be daily values for the calculate variable (such as precipitation, etc) for the whole year.

Example: Let's say I have a precipitation data for the whole year as 1mm/hr every day, I would have 2928 values for the whole year.

What I want is 24mm/day for the whole year with only 365 values for a non-leap year.

Example input dataset: Subset of the data can be downloaded from here (for 1st and 2nd January 2017) https://www.dropbox.com/sh/0vdfn20p355st3i/AABKYO4do_raGHC34VnsXGPqa?dl=0. Just use the 2nd script after this to check the code. {the code for the whole year is >10 GB thus can't be uploaded

Thanks in advance

Community
  • 1
  • 1
Ep1c1aN
  • 683
  • 9
  • 25
  • Thanks for the code dump but it is not very useful. I can't run your code which means you just pasted a text book worth of stuff we need to go line by line to figure out what you are doing. Please edit your question and give example input data and the output you expect from that data. Your question should be a **complete** **MINIMUM**, and **verifiable** example. I can help you but not with the content you currently have. – Error - Syntactical Remorse Apr 16 '19 at 19:57
  • @Error-SyntacticalRemorse.....Thanks for looking at the script, I have made changes to it now. In simple terms, I want to convert the hourly data for the whole year(which is roughly 8760 values) to daily data (365 values; the sum of 24 hours) for the whole year. Copernicus server takes a lot of queue time to download the large dataset, but you can change values in the first code very easily. I hope that helps. – Ep1c1aN Apr 17 '19 at 13:55
  • Please post actual data that we can use. The question has the same issue as before. You posted a link to where to get it but how do I get it from there? Just grab a subset of the data you are getting and add it to the question. – Error - Syntactical Remorse Apr 17 '19 at 14:06
  • @Error-SyntacticalRemorse....apologies, I mistook the interpretation of your comment. The sunset of the data is now attached in the link. This might be of some help. Thanks !! – Ep1c1aN Apr 17 '19 at 18:38
  • I will take a look at this later today and see what I can do. – Error - Syntactical Remorse Apr 18 '19 at 18:26
  • Ep1c1aN, agree that the code on the confluence Wiki is clunky to adapt, to be honest I find this task far easier with CDO, it is a one-liner from the command-line (see below). The xarray solution is also neat, not sure why ECMWF didn't adopt the use of that package on their Wiki. – ClimateUnboxed Feb 17 '20 at 08:01

2 Answers2

6

xarray resample is just the tool for you. It converts netCDF data from one temporal resolution (e.g. hourly) to another (e.g. daily) in one line. Using your sample data file, we can create daily-means using the following code:

import xarray as xr

ds = xr.open_dataset('./tp_20170101-20170102.nc')
tp = ds['tp'] # dimensions [time: 48, latitude: 721, longitude: 1440]
tp_daily = tp.resample(time='D').mean(dim='time') # dimensions (time: 2, latitude: 721, longitude: 1440)

You'll see that the resample command takes in a temporal code, in this case 'D' which means daily and then we specify that we want to compute the mean for each day using the hourly data of that day with .mean(dim='time').

If instead, for example, you wanted to compute the daily max rather than the daily mean, you'd replace .mean(dim='time') with .max(dim='time'). You can also go from hourly to monthly (MS or month-start), annual (AS or annual-start), and many more. The temporal frequency codes can be found in the Pandas docs.

N1B4
  • 3,377
  • 1
  • 21
  • 24
  • 1
    Nice answer... Just one thing, I think one would need to use the function xarray.Dataset.shift to shift the time 1 hour earlier before calculating the mean or sum, due to the fact that the timestamp is at the end of the hourly window (so the 23:00-24:00 last hour of the day has the stamp 00:00 on the day after). See here: https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation I would try to edit the answer in include this but I'm not fluent with xarray and would probably make an error. – ClimateUnboxed Feb 17 '20 at 07:57
0

An alternative quick method from the command line using CDO would be:

cdo daysum -shifttime,-1hour era5_hourly.nc era5_daily.nc

you can call this directly from within python using the python package.

Note, as per this answer/discussion here: Calculating ERA5 Daily Total Precipitation using CDO the ERA5 hourly data has the timestep at the end of the hourly window, so you need to shift the timestamp before making the sum, I'm not sure the xarray solution handles that. Also to have mm/day, I think one needs to sum, not take the mean.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86