Python - Getting all images from an html file

Question

Can someone help me parse a html file to get the links for all the images in the file in python?

Preferably with out a 3rd party module...

Thanks!

score 11 · Accepted Answer · edited Mar 02 '13 at 23:44

11

You can use Beautiful Soup. I know you said without a 3rd party module. However, this is an ideal tool for parsing HTML.

import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("http://www.url.com"))
page.findAll('img')

edited Mar 02 '13 at 23:44

citruspi

6,709
4
27
43

answered Nov 28 '10 at 03:21

Russell Dias

70,980
5
54
71

1

OK. Seems like this will help it out alot so I'll check it out. Thanks! – user377419 Nov 28 '10 at 03:35
2

I think Russell missed `BeautifulSoup(page)` – Kurt Liu Jul 05 '11 at 21:32

score 11 · Answer 2 · answered Nov 28 '10 at 03:38

11

only using PSL

from html.parser import HTMLParser
class MyParse(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag=="img":
            print(dict(attrs)["src"])

h=MyParse()
page=open("index.html").read()
h.feed(page)

answered Nov 28 '10 at 03:38

Kabie

10,489
1
38
45

1

You can augment this with urllib to open a web page and download the images. – Rafe Kettler Nov 28 '10 at 03:43
2

For me this only works with "from HTMLParser import HTMLParser" – nvrandow Mar 06 '14 at 15:17

score 2 · Answer 3 · edited May 23 '17 at 12:00

2

It's generally accepted that lxml is faster than Beautiful Soup (ref). Its tutorial can be found here: (link) You may also take a look at this old stackoverflow post.

edited May 23 '17 at 12:00

Community

1
1

answered Nov 28 '10 at 04:34

Overmind Jiang

623
5
17

Python - Getting all images from an html file

3 Answers3