1

I'm using cygwin and do not have BeautifulSoup installed.

jonderry
  • 23,013
  • 32
  • 104
  • 171

2 Answers2

0

If you don't care much about performance you can use regular expressions:

import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)

If you just want links like in http:// links then change the expression to:

linkre = re.compile(r"""href=["']http:([^"']+)["']""")

Or you can put "' as optional if by some chance you have html without them around the links.

Piotr Lopusiewicz
  • 2,514
  • 2
  • 27
  • 38
  • Regular expressions would likely actually be faster than doing proper HTML parsing, so I don't think this is a matter of performance but rather correctness. – Liquid_Fire Dec 11 '10 at 01:56