I am a newbie just trying to follow the webscraping examples from automate the boring stuff webscraping example. What I'm trying is to automate downloading images from phdcomics in one python code that will
find the link of the image from HTML and download then
find the link for the previous page from HTML and go there to repeat step 1 until the very first page.
For the downloading current page image, the segment of the HTML code after printing soup.prettify() looks like this -
<meta content="Link to Piled Higher and Deeper" name="description">
<meta content="PHD Comic: Remind me" name="title">
<link
href="http://www.phdcomics.com/comics/archive/phd041218s.gif" rel="image_src">
<div class="jumbotron" style="background-color:#52697d;padding: 0em 0em 0em; margin-top:0px; margin-bottom: 0px; background-image: url('http://phdcomics.com/images/bkg_bottom_stuff3.png'); background-repeat: repeat-x;">
<div align="center" class="container-fluid" style="max-width: 1800px;padding-left: 0px; padding-right:0px;">
and then when I write
newurl=soup.find('link', {'rel': "image_src"}).get('href')
it gives me what I need, which is
"http://www.phdcomics.com/comics/archive/phd041218s.gif"
In the next step when I want to find the previous page link, which I believe is in the following part of the HTML code -
<!-- Comic Table --!>
<table border="0" cellspacing="0" cellpadding="0">
<tr>
<td align="right" valign="top">
<a href=http://phdcomics.com/comics/archive.php?comicid=2004><img height=52 width=49 src=http://phdcomics.com/comics/images/prev_button.gif border=0 align=middle><br></a><font
face=Arial,Helvetica,Geneva,Swiss,SunSans-Regular size=-1><i><b>previous </b></i></font><br><br><a href=http://phdcomics.com/comics/archive.php?comicid=1995><img src=http://phdcomics.com/comics/images/jump_bck10.gif border=0></a><br><a href=http://phdcomics.com/comics/archive.php?comicid=2000><img src=http://phdcomics.com/comics/images/jump_bck5.gif border=0></a><br><font face=Arial,Helvetica,Geneva,Swiss,SunSans-Regular size=-1><i><b>jump</b></i></font><br><br><a href=http://phdcomics.com/comics/archive.php?comicid=1><img src=http://phdcomics.com/comics/images/first_button.gif border=0 align=middle><br></a><font face=Arial,Helvetica,Geneva,Swiss,SunSans-Regular size=-1><i><b>first</b></i></font><br><br> </td>
<td align="center" valign="top"><font color="black">
From this part of the code I want to find
=http://phdcomics.com/comics/archive.php?comicid=2004
as my previous link. when I try something like this -
Prevlink=soup.find('a',{'src': 'http://phdcomics.com/comics/images/prev_button.gif'}).get('href')
print(Prevlink)
it gives me an error like this-
Prevlink=soup.find('a',{'src': 'http://phdcomics.com/comics/images/prev_button.gif'}).get('href')
AttributeError: 'NoneType' object has no attribute 'get'
Even when I try to do this-
Prevlink=soup.find('a',{'href': 'http://phdcomics.com/comics/archive.php?comicid=2004'}).get('href')
print(Prevlink)
I get similar error -
Prevlink=soup.find('a',{'href': 'http://phdcomics.com/comics/archive.php?comicid=2004'}).get('href')
AttributeError: 'NoneType' object has no attribute 'get'
What should be the right way to get the right 'href'? TIA