0

I am trying to get the URL list, not a True of False response at the end of the statement.

#!/usr/bin/env python

import requests
from BeautifulSoup import BeautifulSoup

url ="https://www.geant.tn/"
response = requests.get(url)
# parse html
page = str(BeautifulSoup(response.content))

def getURL(page):

No problem for this part

    """
    :param page: html of web page (here: Python home page)
    :return: urls in that page
    """
    start_link = page.find("a href")
    if start_link == -1:
        return None, 0
    start_quote = page.find('"', start_link)
    end_quote = page.find('"', start_quote + 1)
    url = page[start_quote + 1: end_quote]
    return url, end_quote

while True:
    url, n = getURL(page)
    page = page[n:]

I am having a problem here, as I am getting True or False displayed:

if url.endswith('.html'):
    print url
else:
    break

If you can help me, thanks a lot!

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • 1
    I'm not really sure what you are asking. Could you clarify? – Mathyn Feb 01 '19 at 12:31
  • 2
    Possible duplicate of [retrieve links from web page using python and BeautifulSoup](https://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup) – stovfl Feb 01 '19 at 12:57

1 Answers1

0

If you want to get all the URLs in that page that end with .html then it is easiest to use the find_all() function in BeautifulSoup to return all the a tags that contains href attributes. You can then use a list comprehension to build your list, including only those that end with .html. For example:

import requests
from bs4 import BeautifulSoup

url = "https://www.geant.tn/"
response = requests.get(url)
# parse html
soup = BeautifulSoup(response.content, "html.parser")

def getURLs(soup):
    return [a_tag['href'] for a_tag in soup.find_all('a', href=True) if a_tag['href'].endswith('.html')]

urls = getURLs(soup)

for url in urls:
    print url

This would display URLs starting:

https://www.geant.tn/evenement-geant.html
https://www.geant.tn/electromenager-35.html
https://www.geant.tn/gros-electromenager-50.html
https://www.geant.tn/petit-electromenager-53.html
Martin Evans
  • 45,791
  • 17
  • 81
  • 97