I'm trying to find a series of URLs (twitter links) from the source of a page and then put them into a list in a text document. The problem I have is that once I .readlines() the urlopen object, I have a grand total of 3-4 lines each consisting of dozens of urls that I need to collect one-by-one. This is the snippet of my code where I try to rectify this:
page = html.readlines()
for line in page:
ind_start = line.find('twitter')
ind_end = line.find('</a>', ind_start+1)
while ('twitter' in line[ind_start:ind_end]):
output.write(line[ind_start:ind_end] + "\n")
ind_start = line.find('twitter', ind_start)
ind_end = line.find('</a>', ind_start + 1)
Unfortunately I can't extract any urls using this. Any advice?