I need to process weather data from this website (https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.20190814/06/), each file is around 300MB. Once I download the file, I only need to read in a subset of it. I think that downloading it is going to be too slow, so I was going to use BeautifulSoup to read in the data directly from the website, like this
from bs4 import BeautifulSoup
import requests
url = 'https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.20190814/06/gfs.t06z.pgrb2.0p25.f000'
response = requests.get(url)
soup = BeautifulSoup(response.content, features='lxml')
And then using the pygrib
library to read in a subset of the resulting .grib (a weather data format) file.
However, this also proves to be too slow, taking approx 5 minutes for something that will need to be done 50 times a day. Is there some faster alternative I am not thinking of?