Python adding a string to a match list with multiple items

Question

The code I am working on is retrieving a list from an HTML page with 2 fields, URL, and title...

The URL anyway starts with /URL.... And I need to append the "http://website.com" to every returned vauled from a re.findall.

The code so far is this:

bsoup=bs(html)
tag=soup.find('div',{'class':'item'})
reg=re.compile('<a href="(.+?)" rel=".+?" title="(.+?)"')
links=re.findall(reg,str(tag))
*(append "http://website.com" to the href"(.+?)" field)*
return links

http://stackoverflow.com/a/1732454/1459669 Please, use beautiful soup to find the links! — noɥʇʎԀʎzɐɹƆ, Dec 26 '15 at 00:06
@timgeb You never know, he might want to summon him. Then we'll need it migrated to StackExchange Skeptics or Worldbuilding... — noɥʇʎԀʎzɐɹƆ, Dec 26 '15 at 00:17
Are you going to accept the answer or what? It's like saying "thank you" because it awards reputation. — noɥʇʎԀʎzɐɹƆ, Dec 28 '15 at 16:16

noɥʇʎԀʎzɐɹƆ · Accepted Answer · 2015-12-26T00:16:13.907

2

Try:

for link in tag.find_all('a'):
    link['href'] = 'http://website.com' + link['href']

Then use one of these output methods:

return str(soup) gets you the document after the changes are applied.

return tag.find_all('a') gets you all the link elements.

return [str(i) for i in tag.find_all('a')] gets you all the link elements converted to strings.

Now, don't try to parse HTML with regex while you have a XML parser already working.

edited Dec 26 '15 at 00:16

answered Dec 26 '15 at 00:11

noɥʇʎԀʎzɐɹƆ

9,967
2
50
67

Oops, my bad. Reversed order of URL appendage. – noɥʇʎԀʎzɐɹƆ Dec 26 '15 at 00:19

Python adding a string to a match list with multiple items

1 Answers1