Let me focus on the specific part of your problem in the html:
<a class='warp_lightbox' title='Comprar' href='//www.fotoregistro.com.br/
navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO'><img src='
//sh.digipix.com.br/subhomes/_lojas_consumer/paginas/fotolivro/img/180slim/vitrine/classic_01_tb.jpg' alt='slim' />
</a>
You can get it by doing:
for link in soup.find_all('a', {'class':'warp_lightbox'}):
url = link.get("href")
break
you find out that url
is:
'//www.fotoregistro.com.br/\rnavhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO'
You can see two important patterns at the begininning of the string:
//
which is a way to keep the current protocol, see this;
\r
which is ASCII Carriage Return (CR).
When you print it, you simply lose this part:
//www.fotoregistro.com.br/\r
If you need the raw string, you can use repr
in your for
loop:
print(repr(url))
and you get:
//www.fotoregistro.com.br/\rnavhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
If you need the path, you can replace the initial part:
base = 'www.fotoregistro.com.br/'
for link in soup.find_all('a', {'class':'warp_lightbox'}):
url = link.get("href").replace('//www.fotoregistro.com.br/\r',base)
print(url)
and you get:
www.fotoregistro.com.br/navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
www.fotoregistro.com.br/navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/preview=true/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
.
.
.
Without specifying the class:
for link in soup.find_all('a'):
url = link.get("href")
print(repr(url))