-1

I have web page in which i have the structure like

<div>
<ul>
<li class=tree>
<a>  </a>
</li>
</ul>
</div>

NOw i want to grab all those hyperlinks and put in text file in python

Mirage
  • 30,868
  • 62
  • 166
  • 261
  • 1
    Did you try one of the **Related** links? – Ignacio Vazquez-Abrams Jun 06 '11 at 05:29
  • this link might be helpful http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautiful-soup – vkrams Jun 06 '11 at 05:41
  • 3
    This has been asked and answered multiple times. Go find Beautiful Soup and read the docs. – Peter Rowell Jun 06 '11 at 05:41
  • @Ignacio I don't want to use beautiful Soup .Someone was saying that beautiful soap is not currently being developed. Is there any other alternative – Mirage Jun 06 '11 at 06:01
  • @bidu any reliable source saying that `BeautifulSoup` is not currently being developed (Update: their [launchpad site](https://code.launchpad.net/beautifulsoup) does not appear inactive to me)? – OnesimusUnbound Jun 06 '11 at 06:09
  • @Onesimus see the comment of zeekay in this question http://stackoverflow.com/questions/6236794/how-much-is-the-difference-between-html-parsing-and-web-crawling-in-python . Actually i am beginning python parsing so i don't to spend time on something which is not actively ddeveloped and i have to switch to something else later in stage – Mirage Jun 06 '11 at 06:12
  • @bidu see my update in the previous comment. Anyway, thanks for the link and I got interested in `scrapy`. I'll check it out :-) – OnesimusUnbound Jun 06 '11 at 06:23
  • @Onesimus This link here also mentiones about BS problems http://stackoverflow.com/questions/1922032/parsing-html-in-python-lxml-or-beautifulsoup-which-of-these-is-better-for-what – Mirage Jun 06 '11 at 06:23
  • @bidu it's the problem with the 3.1.x series and the 3.2.x is the recommended, working version. – OnesimusUnbound Jun 06 '11 at 06:27
  • If you want to avoid BeautifulSoup either use lxml or scrapy. – Joao Figueiredo Jun 06 '11 at 09:18

2 Answers2

2

Use BeautifulSoup

OnesimusUnbound
  • 2,886
  • 3
  • 30
  • 40
0

You can use the module xml.dom.minidom, though it is not ported to certain versions of python3 if that is an issue.

ninjagecko
  • 88,546
  • 24
  • 137
  • 145