0

I have written a program to extract links from a string.

Please check the code below.

def next_target(page):
    if (page.find('<a href') == -1):
        return 0,0
    link = (page.find('<a href='))
    first_quote = (page.find('"', link))
    second_quote = (page.find('"', first_quote + 1))
    url = page[first_quote + 1 : second_quote]
    return url, second_quote

def get_all_links(page):
    while True:
        url, endpos = next_target(page)
        if (url):
            print (url)
            page = page[endpos :]
        else:
            break

print (get_all_links('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))

The program is extracting all the links. But along with the links, a None value is also returned. How can I stop the None from appearing after the links are extracted?

I have written another program to do the exact same thing. However, it's also returning a None value.

Please check the code below:

def page_pro(page):
    end_quote = 0
    if (page.find('<a href') == -1):
        return None
    else:
        while (page.find('<a href', end_quote) != -1):
            start_link = (page.find('<a href=', end_quote))
            first_quote = (page.find('"', start_link))
            end_quote = (page.find('"', first_quote + 1))
            url = page[first_quote + 1 : end_quote]
            print (url)

print (page_pro('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))
Bijoy
  • 1,131
  • 1
  • 12
  • 23
  • I'm not sure if this is an assignment or something where you can't use it but otherwise, you might want to look into [`BeautifulSoup`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for parsing html in Python. – G_M Mar 03 '18 at 04:14
  • 1
    In your `page_pro` function, if the first condition isn't met you don't return anything, and then you try to print the return value, which will be None – user3483203 Mar 03 '18 at 04:14
  • 1
    See the docs on building a [mcve] -- the *simplest possible code* that produces a given problem. The work of isolating the problem enough to generate such a reproducer is a prerequisite to being able to ask a high-quality question here. – Charles Duffy Mar 03 '18 at 04:17
  • In addition to chrisz comment - the solution is not to print the returned None value. Instead simply call your function page_pro(), all the necessary printing happens within this function. – Mr. T Mar 03 '18 at 04:22
  • You're explicitly returning `None`, and only printing the URL, instead you could try appending URLs to a list and returning that? – import random Mar 03 '18 at 04:22
  • Thank you for your inputs! The community is very helpful. – Shreyash Karnik Mar 03 '18 at 04:48

1 Answers1

1

Here:

print (get_all_links('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))

or here:

print (page_pro('<div class="no-padding locale-column col-md-3"><a href="https://ee.000webhost.com/" class=""><span class="flag flag-ee"></span><span class="region">Eesti</span> <span class="language">Eesti</span></a><a href="https://es.000webhost.com/" class=""><span class="flag flag-es"></span><span class="region">España</span> <span class="language">Español</span></a><a href="https://fi.000webhost.com/" class=""><span class="flag flag-fi"></span><span class="region">Suomi</span> <span class="language">Suomi</span></a><a href="https://fr.000webhost.com/" class=""><span class="flag flag-fr"></span><span class="region">France</span> <span class="language">Français</span></a><a href="https://gr.000webhost.com/" class=""><span class="flag flag-gr"></span><span class="region">Ελλάδα</span> <span class="language">Ελληνικά</span></a><a href="https://hr.000webhost.com/" class=""><span class="flag flag-hr"></span><span class="region">Hrvatska</span> <span class="language">Hrvatski</span></a><a href="https://hu.000webhost.com/" class=""><span class="flag flag-hu"></span><span class="region">Magyarország</span> <span class="language">Magyar</span></a><a href="https://in.000webhost.com/" class=""><span class="flag flag-en-in"></span><span class="region">India</span> <span class="language">English</span></a><a href="https://th.000webhost.com/" class=""><span class="flag flag-th"></span><span class="region">ประเทศไทย</span> <span class="language">ไทย</span></a></div>'))

you are printing the return value of function, returning nothing. Just omit print(), use

get_all_links(…) 

or

page_pro(…)
Amaro Vita
  • 437
  • 3
  • 9
  • Appreciate the answer. Got the logic behind the problem. The print function will look for a return value. If a return value is passed at a later part of the code, it won't print the value which is returned earlier. – Shreyash Karnik Mar 03 '18 at 04:51
  • @ShreyashKarnik Not directly related, but see https://stackoverflow.com/questions/7664779/what-is-the-formal-difference-between-print-and-return – OneCricketeer Mar 03 '18 at 05:43