I am trying to write a code where I don't have to input every single year, code just does it for me. I tried using for loop such that for year in range(1995,2021):
but the code give me the following error
File "C:\Users\chadd\OneDrive\Desktop\Wind Spacecraft\Codes\get_files_wind_test.py", line 38, in <module>
sub_dir = year + '/'
# sub directory because there are several year in the parent directory
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Below is my code without for loop. Here I enter the year manually and it downloads the file from web and saves in the particular year. I am thinking of replacing year=input("Enter the year:")
with the for loop.
###############
## Define sc and date range
###############
year = input("Enter the year: ") # takes the input
###############
## Define Paths
###############
external_url_base = 'https://cdaweb.gsfc.nasa.gov/pub/data/wind/waves/dust_impact_l3/' # url from where we need to scarp our data
sub_dir = year + '/' # sub directory because there are several year in the parent directory
url = external_url_base + sub_dir
local_dir_base = r'C:\Users\chadd\OneDrive\Desktop\Wind Spacecraft\Data' # this is my directory where files will be saved
sub_dir = '/'+ year + '/' # since there are years ranging from 1995 to 2020,as the input change different year files get stored in different year folder.
local_dir = local_dir_base + sub_dir # this line compiles local base and sub as one and this is the path that python uses to save files.
#########################
# Identify remote files #
#########################
## Read web page
resp = requests.get(url)
# create beautiful-soup object (all links on web page)
soup = BeautifulSoup(resp.content, 'html5lib')
## Error handle
if resp.status_code != 200:
print('**ERROR: No data available from then**')
resp.raise_for_status()
# create beautiful-soup object (all links on web page)
soup = BeautifulSoup(resp.content, 'html5lib')
# find all links on web-page
links = soup.findAll('a')
# filter the link sending with .cdf
cdf_files = []
for l in links:
if l['href'].endswith('cdf'):
#print(l['href'])
cdf_files.append(url + l['href'])
print(cdf_files)
###############
## Go get remote files, download locally
###############
# Iterate through list of files
for link in cdf_files:
#print(link)
# get the file name seperately
fn = link.split('/')[-1]
r = requests.get(link)
#print(r)
# Sub directory based on type of data being downloaded, to be saved in the Data directory. ex: TDS_files, QF_files, etc.
data_file = local_dir + fn
# Check if the file is already in that directory to avoice duplicates and uneccisary processing.
if path.exists(data_file):
print('Already have ',data_file, '.\nMoving on...')
continue
else:
print('Downloading ',data_file, '...')
with open(data_file, 'wb') as f:
f.write(r.content)
Any help is appreciated!