0

I have this python code, but when I run it, is printing out just the first target, here is my python code:

def get_next_target(S):
    start_link = S.find('<a href=')
    start_quote = S.find('"', start_link)
    end_quote = S.find('"', start_quote + 1)
    url = S[start_quote + 1:end_quote]
    print url
    return url, end_quote

get_next_target(S)

where variable S = '<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href="www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'

What I want is to print out the three targets, but instead it's just printing the first one, why is that?

smci
  • 32,567
  • 20
  • 113
  • 146

2 Answers2

2

I think you should use BeautifulSoup to extract info from html/xml.

In [1]: from bs4 import BeautifulSoup

In [2]: html = '''<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href=
   ...: "www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'''

In [3]: soup = BeautifulSoup(html, 'lxml')

In [4]: for a in soup.find_all('a'):
   ...:     print(a['href'])
   ...:     
www.target1.com
www.target2.com
www.target3.com
宏杰李
  • 11,820
  • 2
  • 28
  • 35
0

If you logically want to achieve this without using any special module then following code will do that.

import re
import sys
S = '<susuds><a href="www.target1.com"/><ahsahsh><saudahsd><a href="www.target2.com"/><p>sa</h1><a href="www.target3.com"/>'
abc = []
def get_next_target(S):
    search_index = [i.start() for i in re.finditer('<a href=', S)]
    for j in range(len(search_index)):
        if ( j == len(search_index)-1):
            A =S[ search_index[j]:len(S) ]
            search_start_index = A.find('"')
            search_end_index = A.rfind('"')
            start_final = search_index[j] + search_start_index  + 1
            start_end = search_index[j] + search_end_index
            final_result = S[ start_final:start_end ]
            abc.append(final_result)
            print abc
        else:
            A = S[ search_index[j]:search_index[j+1] ]
            search_start_index = A.find('"')
            search_end_index = A.rfind('"')
            start_final = search_index[j] + search_start_index + 1 
            start_end = search_index[j] + search_end_index
            final_result = S[ start_final:start_end ]
            abc.append(final_result)`enter code here`
get_next_target(S)

Note: If you don't want to append the result in to a list then replace the last two line of if and else statement with "print final_result".enter code here

Himanshu
  • 1
  • 2