0

I am trying to extract href with this code

while the soup is like this 
</div>
 </div>
 </article>
 </div>
 <div class="listing">
 <article class="listing-item image-left" itemscope="" itemtype="https://schema.org/NewsArticle">
 <div class="listing-image image-container">
 <a class="image page-link" href="/mundo/venezuela/entrevista-con-el-representante-para-los-migrantes-venezolanos-eduardo-stein-425664">
 <img alt="" src="/files/image_184_123/uploads/2019/10/22/5daf22f15ed09.jpeg"/>
 </a>
 </div>

import requests

url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find_all('div')

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

can somebody help me? all the examples i find in internet are with hrefs that are close to an a, easier to extract

thankyou

Abhi
  • 4,068
  • 1
  • 16
  • 29
  • You are getting the `href` attribute from a `div` tag, maybe what you need is `tags = soup.find_all('a')` rather than `tags = soup.find_all('div')` – bug Oct 26 '19 at 23:23

1 Answers1

0

Probably you want data = response.html, as well as soup.find_all('a'). You can also use soup.find_all('a', href=True) if you only want <a> tags that have an href (see BeautifulSoup getting href)

import requests

url = "https://www.eltiempo.com/buscar?q=migrantes+venezolanos"

response = requests.get(url)

data = response.html
soup = BeautifulSoup(data, 'lxml')
tags = soup.find_all('a')
for tag in tags:
    print(tag['href'])
nighthawk454
  • 943
  • 12
  • 20