I have written a program to extract links from a string.
Please check the code below.
def next_target(page):
if (page.find('<a href') == -1):
return 0,0
link = (page.find('<a href='))
first_quote = (page.find('"', link))
second_quote = (page.find('"', first_quote + 1))
url = page[first_quote + 1 : second_quote]
return url, second_quote
def get_all_links(page):
while True:
url, endpos = next_target(page)
if (url):
print (url)
page = page[endpos :]
else:
break
print (get_all_links('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))
The program is extracting all the links. But along with the links, a None
value is also returned. How can I stop the None
from appearing after the links are extracted?
I have written another program to do the exact same thing. However, it's also returning a None
value.
Please check the code below:
def page_pro(page):
end_quote = 0
if (page.find('<a href') == -1):
return None
else:
while (page.find('<a href', end_quote) != -1):
start_link = (page.find('<a href=', end_quote))
first_quote = (page.find('"', start_link))
end_quote = (page.find('"', first_quote + 1))
url = page[first_quote + 1 : end_quote]
print (url)
print (page_pro('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))