How to use for loop to scrape data for 15 years?

Question

I am trying to write a code where I don't have to input every single year, code just does it for me. I tried using for loop such that for year in range(1995,2021): but the code give me the following error

  File "C:\Users\chadd\OneDrive\Desktop\Wind Spacecraft\Codes\get_files_wind_test.py", line 38, in <module>
    sub_dir           = year + '/'                                              
   # sub directory because there are several year in the parent directory

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Below is my code without for loop. Here I enter the year manually and it downloads the file from web and saves in the particular year. I am thinking of replacing year=input("Enter the year:") with the for loop.

###############
## Define sc and date range
###############

year  = input("Enter the year: ")            # takes the input

###############
## Define Paths 
###############

external_url_base = 'https://cdaweb.gsfc.nasa.gov/pub/data/wind/waves/dust_impact_l3/'  # url from where we need to scarp our data
sub_dir           = year + '/'                                                 # sub directory because there are several year in the parent directory
url               = external_url_base + sub_dir

local_dir_base   = r'C:\Users\chadd\OneDrive\Desktop\Wind Spacecraft\Data'     # this is my directory where files will be saved
sub_dir           = '/'+ year + '/'                                            # since there are years ranging from 1995 to 2020,as the input change different year files get stored in different year folder.

local_dir         = local_dir_base + sub_dir                                   # this line compiles local base and sub as one and this is the path that python uses to save files.
#########################
# Identify remote files #
#########################


## Read web page 
resp = requests.get(url)

# create beautiful-soup object (all links on web page)
soup = BeautifulSoup(resp.content, 'html5lib')

## Error handle 
if resp.status_code != 200:
    print('**ERROR: No data available from then**')
    resp.raise_for_status()

# create beautiful-soup object (all links on web page)
soup = BeautifulSoup(resp.content, 'html5lib')

# find all links on web-page
links = soup.findAll('a')

# filter the link sending with .cdf
cdf_files = []
for l in links:
    if l['href'].endswith('cdf'):
        #print(l['href'])
        cdf_files.append(url + l['href'])

print(cdf_files)

###############
## Go get remote files, download locally 
###############

# Iterate through list of files
for link in cdf_files:
    #print(link)
    # get the file name seperately
    fn = link.split('/')[-1]
    r = requests.get(link)
    #print(r)

    # Sub directory based on type of data being downloaded, to be saved in the Data directory. ex: TDS_files, QF_files, etc.
    data_file = local_dir + fn

    # Check if the file is already in that directory to avoice duplicates and uneccisary processing.
    if path.exists(data_file):
        print('Already have ',data_file, '.\nMoving on...')
        continue
    else:
        print('Downloading ',data_file, '...')
        with open(data_file, 'wb') as f:
            f.write(r.content)

Any help is appreciated!

The code you posted runs without errors. Maybe you converted year to an int at some point? — Robin Dillen, Jan 06 '22 at 19:23
Does this answer your question? [Python String and Integer concatenation](https://stackoverflow.com/questions/2847386/python-string-and-integer-concatenation) — luk2302, Jan 06 '22 at 19:27

score 1 · Answer 1 · edited Jan 07 '22 at 01:13

1

Here year is an int (a number), you need to convert (cast) it to string (text) using sub_dir = str(year) + '/'.

That should fix your problem in the for loop.

edited Jan 07 '22 at 01:13

ch4rl1e97

666
1
7
24

answered Jan 06 '22 at 19:24

DrummerMann

692
4
9

It gives me this error, can only concatenate str (not "int") to str. – The Wanderer Jan 06 '22 at 19:31
As luk2302 also said in the comments, you can't concatenate str and int types. Just cast the year (from the for loop) to str. – DrummerMann Jan 06 '22 at 19:34

score 0 · Accepted Answer · answered Jan 06 '22 at 19:48

The Error

+ is an "operator". That is, it does an operation on the things either side of it. We call those things "operands." Hence the wording of the error.

On the left you have an int (integer number), and on the right a str (some text). Whilst we as humans don't really think of numbers and symbols as all that different they are totally different things to a computer. The text symbol "1" is completely different to the actual number 1. So, for the computer, saying words plus 123 makes no sense.

So, you need to tell Python to convert your number (the year) into text. The quickest way to do that is wrap the number in str().

Fixing it

tl;dr: Change line 38 to this: sub_dir = str(year) + '/'

You'll likewise have to edit line 42 to sub_dir = '/'+ str(year) + '/'

Sometimes, Python will be smart and convert things to the correct type automatically (possibly because another programmer has done the above and added this conversion in), but in cases like this it isn't.

Thanks for the explanation and the solution. – The Wanderer Jan 06 '22 at 19:51 — The Wanderer, Jan 06 '22 at 19:51

How to use for loop to scrape data for 15 years?

2 Answers2

The Error

Fixing it