0

I want to detect the src attribute of a image from the img tag from html code using python. I think regular expressions can do the job. And I created a regular expression

\<img .*src="(.*)".*/\>

But there are many possible ways to use img tag such as

<img src="images/first.png" alt="" />
<img src="images/first.png" alt="">
<img  alt="" src="images/first.png" />
<img  alt="" width="100" src="images/first.png" height="200">

So my question is, Is the above regular expression enough for the task ? Can any one give a better option ?

Muhammed K K
  • 1,100
  • 2
  • 8
  • 19

2 Answers2

2

Use a HTML parser instead, Python has several to choose from:

ElementTree example:

from xml.etree import ElementTree

tree = ElementTree.parse('filename.html')
for elem in tree.findall('img'):
    print elem['src']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

You can use a beautiful library BeautifulSoup

user1305989
  • 3,231
  • 3
  • 27
  • 34