For weather processing purpose, I am looking to retrieve automatically daily weather forecast data in Google Cloud Storage.
The files are available on public HTTP URL (http://dcpc-nwp.meteo.fr/openwis-user-portal/srv/en/main.home), but they are very large (between 30 and 300 Megabytes). Size of files is the main issue.
After looking at previous stackoverflow topics, I have tried two unsuccessful methods:
1/ First attempt via urlfetch in Google App Engine
from google.appengine.api import urlfetch url = "http://dcpc-nwp.meteo.fr/servic..." result = urlfetch.fetch(url) [...] # Code to save in a Google Cloud Storage bucket
But I get the following error message on the urlfetch line :
DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL
2/ Second attempt via the Cloud Storage Transfert Service
According to the documentation, it is possible to retrieve HTTP Data into Cloud Storage directly via the Cloud Storage Transfert Service : https://cloud.google.com/storage/transfer/reference/rest/v1/TransferSpec#httpdata
But it requires the size and md5 of the files before the download. This option cannot work in my case because the website does not provide those information.
3/ Any ideas ?
Do you see any solution to retrieve automatically large file on HTTP into my Cloud Storage bucket?