2

I have tried requests, pydap, urllib, and netcdf4 and keep either getting redirect errors or permission errors when trying to download the following NASA data:

GLDAS_NOAH025SUBP_3H: GLDAS Noah Land Surface Model L4 3 Hourly 0.25 x 0.25 degree Subsetted V001 (http://disc.sci.gsfc.nasa.gov/uui/datasets/GLDAS_NOAH025SUBP_3H_V001/summary?keywords=Hydrology)

I am attempting to download about 50k files, here is an example of one, which works when pasted into google chrome browser (if you have proper username and password):

http://hydro1.gesdisc.eosdis.nasa.gov/daac-bin/OTF/HTTP_services.cgi?FILENAME=%2Fdata%2FGLDAS_V1%2FGLDAS_NOAH025SUBP_3H%2F2016%2F244%2FGLDAS_NOAH025SUBP_3H.A2016244.2100.001.2016256190725.grb&FORMAT=TmV0Q0RGLw&BBOX=-11.95%2C28.86%2C-0.62%2C40.81&LABEL=GLDAS_NOAH025SUBP_3H.A2016244.2100.001.2016286201048.pss.nc&SHORTNAME=GLDAS_NOAH025SUBP_3H&SERVICE=SUBSET_GRIB&VERSION=1.02&LAYERS=AAAB&DATASET_VERSION=001

Anyone have any experience getting OPeNDAP NASA data from the web using python? I am happy to provide more information if desired.

Here is the requests attempt which gives 401 error:

import requests

def httpdownload():
    '''loop through each line in the text file and open url'''
    httpfile = open(pathlist[0]+"NASAdownloadSample.txt", "r")
    for line in httpfile:
        print line 
        outname = line[-134:-122]+".hdf"
        print outname 
        username = ""
        password = "*"
        r = requests.get(line, auth=("username", "password"), stream=True)
        print r.text
        print r.status_code
        with open(pathlist[0]+outname, 'wb') as out:
             out.write(r.content)
        print outname, "finished" # keep track of progress

And here is the pydap example which gives redirect error:

import install_cas_client
from pydap.client import open_url

def httpdownload():
    '''loop through each line in the text file and open url'''
    username = ""
    password = ""
    httpfile = open(pathlist[0]+"NASAdownloadSample.txt", "r")
    fileone = httpfile.readline()
    filetot = fileone[:7]+username+":"+password+"@"+fileone[7:]
    print filetot
    dataset = open_url(filetot)
timpjohns
  • 599
  • 4
  • 13

2 Answers2

4

I did not find a solution using python, but given the information I have now it should be possible. I used wget with a .netrc file and cookie file shown as follows (https://disc.gsfc.nasa.gov/information/howto?title=How%20to%20Download%20Data%20Files%20from%20HTTP%20Service%20with%20wget):

#!/bin/bash 

cd # path to output files 
touch .netrc
echo "machine urs.earthdata.nasa.gov login <username> password <password>" >> .netrc
chmod 0600 .netrc
touch .urs_cookies
wget --content-disposition --trust-server-names --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies 
-i <path to text file of url list>

Hope it helps anyone else working with NASA data from this server.

7yl4r
  • 4,788
  • 4
  • 34
  • 46
timpjohns
  • 599
  • 4
  • 13
  • The link is outdated. The same information is now here: https://disc.gsfc.nasa.gov/SSW/SSW_URL_List_Downloading_Instructions.html – FlippingBinary May 07 '20 at 16:54
3

I realize it's a bit late to answer this question for the original poster, but I stumbled across this question while trying to do the same thing so I'll leave my solution here. It seems the NASA server uses redirects and Basic Authorization in a way the standard libraries don't expect. When you download from (for example) https://hydro1.gesdisc.eosdis.nasa.gov, you'll get redirected to https://urs.earthdata.nasa.gov for authentication. That server sets an authentication token as a cookie and redirects you back to download the file. If you're not handling cookies properly, you'll be stuck in an infinite redirection loop. If you're not handling authentication and redirection properly, you'll get an access denied error.

To get around this problem, chain HTTPRedirectHandler, HTTPCookieProcessor, and HTTPPasswordMgrWithDefaultRealm together and set it as the default opener or just use that opener directly.

from urllib import request

username = "<your username>"
password = "<your password>"
url = "<remote url of file>"
filename = "<local destination of file>"

redirectHandler = request.HTTPRedirectHandler()
cookieProcessor = request.HTTPCookieProcessor()
passwordManager = request.HTTPPasswordMgrWithDefaultRealm()
passwordManager.add_password(None, "https://urs.earthdata.nasa.gov", username, password)
authHandler = request.HTTPBasicAuthHandler(passwordManager)
opener = request.build_opener(redirectHandler,cookieProcessor,authHandler)
request.install_opener(opener)
request.urlretrieve(url,filename)
FlippingBinary
  • 1,357
  • 10
  • 21