0

I'm trying to get PM2.5 data from the public NASA data base. Specifically, I'm following all the steps from this guide. I also created the three files just as explained on this requirements website. Still, I'm getting an error message when I'm running following line of code:

ds = xr.open_dataset(URL1).sel(lat=slice(lat1,lat2),lon=slice(lon1,lon2),time=slice(time1,time2))

That's the error message I'm getting:

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR context: HTTP^ Basic: Access denied. Output exceeds the size limit. Open the full output data in a text editor--------------------------------------------------------------------------- KeyError                                  Traceback (most recent call last) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/xarray/backends/file_manager.py:210, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    209 try:
--> 210     file = self._cache[self._key]
    211 except KeyError:

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [, ('https://goldsmr4.gesdisc.eosdis.nasa.gov/thredds/dodsC/MERRA2_MONTHLY_aggregation/M2TMNXAER.5.12.4_Aggregation.ncml',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '0847945a-4bb6-4085-8032-785c28de3f80']

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last) Input In [3], in ()
     14 time2 = datetime(2019,12,1,0,30,0) # end date
     16 # Read the data for the specified latitude/longitude and date range
---> 17 ds = xr.open_dataset(URL1).sel(lat=slice(lat1,lat2),lon=slice(lon1,lon2),time=slice(time1,time2))

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/xarray/backends/api.py:526, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    514 decoders = _resolve_decoders_kwargs(
    515     decode_cf, ... File src/netCDF4/_netCDF4.pyx:2449, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2012, in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -77] NetCDF: Access failure: 'https://goldsmr4.gesdisc.eosdis.nasa.gov/thredds/dodsC
/MERRA2_MONTHLY_aggregation/M2TMNXAER.5.12.4_Aggregation.ncml'

Can you guys give me any clues on what exactly the problem is? I created the three files (".netrc", ".urs_cookies" and ".dodsrc") and put them in my home directory. I also have a NASA account.

Marcelo Paco
  • 2,732
  • 4
  • 9
  • 26
futur3boy
  • 7
  • 4

1 Answers1

0

It has been a few months, but here is how I do it.

You need to save the script bellow as nasa_merra2_get.py and edit the the following variables:

  • USR: your username
  • PSW: your password
  • global_file: the path to a global MERRA2 file downloaded "by hand". This file needs to hold the full global arrays for longitude and latitude. This file is used to get the indexes to perform a spatial subset.
  • add your region to the get_regions function using a name and [lonmin, lonmax, latmin, latmax] extent.
  • all files, including the cookie fiel, are saved in the /tmp folder, you can change this if you need it.
  • finally, you can run the script with ./nasa_merra2_get.py -t YYYYmmdd -r your_region
import argparse
from datetime import datetime
from pathlib import Path
from http import cookiejar
from urllib import request
import numpy as np
import xarray as xr


# NASA Earthdata login info
USR = 'your_user'
PSW = 'your_password'


def get_extent_indexes(extent, global_file):

    lonmin, lonmax, latmin, latmax = extent

    with xr.open_dataset(global_file) as ds:

        ilonmin = np.argmin(np.abs(ds['lon'].values - lonmin))
        ilonmax = np.argmin(np.abs(ds['lon'].values - lonmax))
        ilatmin = np.argmin(np.abs(ds['lat'].values - latmin))
        ilatmax = np.argmin(np.abs(ds['lat'].values - latmax))

    # Note: no need to "+ 1" in ilonmax/ilatmax. Unlike python slice, opendap
    # includes the last point.
    ilonmin, ilonmax = min((ilonmin, ilonmax)), max((ilonmin, ilonmax))
    ilatmin, ilatmax = min((ilatmin, ilatmax)), max((ilatmin, ilatmax))

    return ilonmin, ilonmax, ilatmin, ilatmax


def get_data(url, outfile):
    """Login to NASA Earthdata, get data from url and save it to outfile."""

    # References:
    # https://forum.earthdata.nasa.gov/viewtopic.php?t=3435
    # https://urs.earthdata.nasa.gov/documentation/for_users/data_access/python
    # https://stackoverflow.com/questions/73531474/how-to-save-urllib-python3-cookies-to-a-file

    # Create a password manager to deal with the 401 response that is returned
    # from Earthdata Login
    password_manager = request.HTTPPasswordMgrWithDefaultRealm()
    password_manager.add_password(None,
                                  'https://urs.earthdata.nasa.gov',
                                  USR,
                                  PSW)

    # Create a cookie jar for storing cookies. This is used to store and return
    # the session cookie given to use by the data server (otherwise it will just
    # keep sending us back to Earthdata Login to authenticate). Ideally, we
    # should use a file based cookie jar to preserve cookies between runs. This
    # will make it much more efficient.

    # cookie_jar = cookiejar.CookieJar()

    # reuse cookie if possible
    cookie_filename = "/tmp/nasa_earthdata.cookies"
    cookie_jar = cookiejar.LWPCookieJar(cookie_filename)
    try:
        cookie_jar.load()
        print(f'Using existing cookie file: {cookie_filename}')
    except FileNotFoundError as fnfe:
        print('No existing cookie file available.')
        pass

    # Install all the handlers.
    opener = request.build_opener(
       request.HTTPBasicAuthHandler(password_manager),
       # request.HTTPHandler(debuglevel=1),    # Uncomment these two lines to see
       # request.HTTPSHandler(debuglevel=1),   # details of the requests/responses
       request.HTTPCookieProcessor(cookie_jar))
    request.install_opener(opener)

    # Create and submit the request. There are a wide range of exceptions that
    # can be thrown here, including HTTPError and URLError. These should be
    # caught and handled.

    # replace [ and ] in the url with encoded version. This is the same as the
    # 'Copy encoded Data URL' button in the opendap site.
    url_enc = url.replace('[', '%5B').replace(']', '%5D')

    print(f'Donwloading url: {url}\nencoded as: {url_enc}')

    myrequest = request.Request(url_enc)
    response = request.urlopen(myrequest)

    # Check if status is OK
    if response.code != 200:
        raise ValueError
    response.begin()

    print(f'Saving to file {outfile}')
    Path(outfile).parent.mkdir(parents=True, exist_ok=True)
    with open(outfile, 'wb') as fp:
        while True:
            chunk = response.read()
            if chunk:
                fp.write(chunk)
            else:
                break

    print(f'Saving cookie file for later use: {cookie_filename}')
    cookie_jar.save()


def get_regions():

    return {'br': [-54, -31, -36, 7],
            'br-meq': [-52, -32, -6, 7.0],
            'br-ne': [-40, -31, -20, -3],
            'br-se': [-49, -35, -28, -18],
            'br-s': [-54, -44, -36, -27],
            'fza': [-62, -42, 1, 15],
            'atlsw': [-70, -20, -55, 25]}


def process_cli_args():

    parser = argparse.ArgumentParser(allow_abbrev=False)

    parser.add_argument('-t', '--time',
                        type=lambda x: datetime.strptime(x, '%Y%m%d'),
                        dest='dt',
                        required=True,
                        help='date in YYYYmmdd format')

    parser.add_argument('-r', '--region',
                        choices=get_regions(),
                        required=True,
                        help='region name')

    args = parser.parse_args()

    print('CLI arguments:')
    for k, v in vars(args).items():
        print(f'{k}: {v}')

    return args


def main():

    args = process_cli_args()

    extent = get_regions()[args.region]

    outfile = Path(f'/tmp/nasa/merra2/{args.region}/raw/{args.dt:%Y}'
                   f'/nasa_merra2_{args.region}_{args.dt:%Y%m%d}.nc')

    # check if file exists and is "final product"
    if outfile.exists():
            print(f'File {outfile} exists, exiting.')
            return

    # global file download "by hand" from the opendap site including only the
    # full lon and lat arrays. This will be used to get the matrix indexes
    # for the region/extent chosen.
    global_file = '/tmp/nasa/merra2/MERRA2_300.inst1_2d_asm_Nx.20100101.nc4.nc4'

    ilonmin, ilonmax, ilatmin, ilatmax = get_extent_indexes(extent, global_file)

    vars_ = ['SLP', 'T2M',
             'U2M', 'V2M',
             'U10M', 'V10M',
             'U50M', 'V50M']

    url_vars = ','.join([f'{var}[0:1:23][{ilatmin}:1:{ilatmax}][{ilonmin}:1:{ilonmax}]'
                         for var in vars_])

    # the numbers in the file names represent different streams/experiments:
    # https://geos5.org/wiki/index.php?title=Discover/Dali/Dirac_Tidbits_and_Notes
    merra_numbers = {198001: 100,
                     199201: 200,
                     200101: 300,
                     201101: 400,
                     202009: 401,
                     202010: 400,
                     202106: 401,
                     202110: 400}
    merra_number = [number
                    for ym, number in merra_numbers.items()
                    if int(f'{args.dt:%Y%m}') >= ym][-1]

    # got this string from the opendap site selecting the desired variables (and time/lat/lon)
    # Note: a .nc4 has to be added to the filename to choose output format
    url = ('https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2'
           f'/M2I1NXASM.5.12.4/{args.dt:%Y}/{args.dt:%m}'
           f'/MERRA2_{merra_number}.inst1_2d_asm_Nx.{args.dt:%Y%m%d}.nc4.nc4'
           f'?{url_vars},'
           f'lat[{ilatmin}:1:{ilatmax}],'
           f'lon[{ilonmin}:1:{ilonmax}],'
           'time[0:1:23]')

    get_data(url, outfile)

if __name__ == '__main__':
    main()