I'm using cygwin and do not have BeautifulSoup installed.
Asked
Active
Viewed 434 times
1
-
5How about installing BeatifulSoup then? Might be the easiest way :) – Sven Marnach Dec 11 '10 at 00:11
-
Possibly, I just saw something in my search results that suggested it might be difficult on cygwin, possibly more difficult than doing it without BeautifulSoup. – jonderry Dec 11 '10 at 00:14
-
Actually, I just installed it pretty easily. It's good to know the other ways though. – jonderry Dec 11 '10 at 00:46
2 Answers
0
If you don't care much about performance you can use regular expressions:
import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)
If you just want links like in http:// links then change the expression to:
linkre = re.compile(r"""href=["']http:([^"']+)["']""")
Or you can put "' as optional if by some chance you have html without them around the links.

Piotr Lopusiewicz
- 2,514
- 2
- 27
- 38
-
Regular expressions would likely actually be faster than doing proper HTML parsing, so I don't think this is a matter of performance but rather correctness. – Liquid_Fire Dec 11 '10 at 01:56